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Abstract 

Robots  and  robot  controllers  are  becoming  more  sophisticated.  Conse- 
quently, the  demands  on  the  controller's  operating  system  are  increasing. 
The  lower  levels  of  robot  control  systems  (indeed,  most  real-time  control 
systems)  are  characterized  by  servo  loops.  This  thesis  examines  servo  loops 
and  how  they  affect  data  communications  within  robot  control  systems.  In 
the  two  systems  described  in  this  thesis  the  special  characteristics  of  servo 
loops  are  exploited  to  enhance  the  data  communications. 

Hic  is  zm  operating  system  for  hierarchies  of  servo  loops.  It  uses  rate 
monotonic  scheduling  for  the  periodic  servo  loop  processes.  Hic  events 
(or  processes)  which  are  used  to  implement  servo  loops  are  not  allowed  to 
block.  They  will  only  surrender  the  processor  upon  completion  or  when 
preempted  by  a  higher  priority  process.  A  non-blocking  communication 
structure.  Periodic  Data  Buffers  (pdb's)  was  developed  for  inter-process 
communication.  Hic  has  been  implemented  and  is  used  successfully  in  a 
controller  for  the  Utah/MIT  hand. 

Ganglia  is  a  proposed  real-time  communication  network.  It  is  intended 
to  allow  the  processors  in  a  robot  controller  to  be  distributed  within  the 
robot.  Thus  the  processors  can  be  close  to  the  sensors  and  actuators  they 
control.  Much  of  the  traffic  on  such  a  network  would  be  periodic.  GANGLIA 
uses  a  central  controller  which  allocates  access  to  the  network.  For  the  pe- 
riodic traffic  a  fixed  schedule,  produced  off-line,  is  used.  For  the  aperiodic 
traffic  round-robin  polling  is  used.  Unlike  most  protocols,  messages  do  not 
contain  the  address  of  the  destination  node.  Instead,  the  messages  are  la- 
beled with  the  "name"  of  its  contents.  Each  node  examines  each  message 
and  decides  whether  or  not  it  is  interested  in  the  message.  A  special  commu- 
nication controller  in  each  node  (the  Communication  Memory  Management 
Unit)  examines  and  selects  the  messages.  The  result  of  this  protocol  is  a 
network-wide  common  memory.  In  this  thesis,  the  GANGLIA  protocol  is  de- 
scribed in  detail  and  some  preliminary  analysis  of  its  effectiveness  in  some 
real  robot  systems  is  given. 
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^  HE  MOST  prominent  characteristic  of  robot  control  programs  is  that  they 
must  meet  real-time  constraints.  Robots  often  operate  in  restricted  environ- 
ments but  the  nea/ world  cannot  be  kept  entirely  at  bay.  Gravity,  unexpected 
obstacles,  mechajiical  imprecision,  and  mechanical  failure  are  features  of  the 
environment  that  a  robot  and  its  controller  must  cope  with.  Consider  the 
problem  of  a  robot  grasping  a  delicate  object.  The  robot  must  continuously 
monitor  the  forces  it  applies  to  the  object  in  order  to  avoid  damaging  the 
object  with  too  much  force  or  dropping  the  object  by  not  applying  enough 
force.  In  this  situation  the  controller  must  detect  excessive  or  insufficient 
force  ajid  respond  before  the  object  is  damaged  or  gravity  pulls  the  object 
away.  It  is  sometimes  possible  to  develop  a  special  gripper  that  would,  by 
its  design,  prevent  the  object  from  slipping  or  being  damaged.  However, 
this  limits  the  generality  of  the  robot  gripper  and  thus  the  tasks  that  can 
be  performed.  Real-time  constraints  also  arise  in  the  normal  operation  of 
robots  not  just  under  exceptional  conditions  such  as  a  slipping  object.  Mov- 
ing a  robot  arm  along  a  smooth  path  to  a  specific  point  requires  frequent 
comparison  of  actual  joint  positions  to  target  positions  and  if  the  motion  is 
to  be  smooth  the  controller  must  respond  in  a  timely  manner  to  the  devia- 
tions. A  robot  controller  must  be  able  to  operate  within  both  kinds  of  time 
constraints  to  be  effective.  Robot  controllers  are  not  the  only  area  where 
real-time  constraints  appear.  Real-time  systems  are  also  used  in  avionics, 
space  applications,  process  control,  laboratory  instrumentation,  and  other 
fields.  Much  of  the  work  presented  in  this  thesis  is  applicable  to  these  other 
areas,  however  we  will  focus  on  robotics  applications. 

The  study  of  real-time  systems  is  developing  into  a  field  in  its  own  right 
[Stan88].  What  separates  real-time  computing  from  traditional  computing 
on  a  theoretical  level  is  the  introduction  of  a  notion  of  absolute  time.  The 
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correctness  of  a  real-time  task  depends  not  only  on  the  proper  sequence  of 
operations  within  the  task  but  also  that  the  operations  are  completed  "on 
time". 

At  the  systems  level  this  has  several  ramifications.  One  is  in  scheduling. 
Not  only  scheduling  processes  for  the  processing  unit  but  also  scheduling  any 
shared  resource.  In  the  chapters  on  the  communication  network,  GANGLIA, 
we  will  see  how  this  can  affect  the  scheduling  of  access  to  a  communication 
medium.  The  traditional  notion  of  "fairness"  in  scheduling  does  not  hold  up 
well  in  real-time  systems.  One  reason  for  this  is  that  "fairness"  is  usually  a 
probabilistic  measure.  For  instance,  processes  may  be  guaranteed  access  to 
a  resource  eventually,  or  that  the  avemge  wait  will  be  within  certain  bounds. 
To  guarantee  that  a  particular  process  will  complete  within  real-time  limits 
under  these  circumstances  is  difficult.  To  guarantee  that  a  set  of  processes 
will  all  complete,  each  one  within  its  limits,  is  even  more  so.  Mok  [Mok83] 
deals  extensively  with  the  theoretical  problems  of  scheduling  and  the  real- 
time operating  system,  SAGE  [Salk88]  [Salk89],  is  an  example  of  a  system 
which  has  removed  the  various  scheduling  pitfalls. 

Another  ramification  is  that  the  first-in-first-out  (FIFO)  queue  which 
is  ubiquitous  in  traditional  systems  nearly  vanishes  in  real-time  systems. 
The  problem  with  FIFO  queues  is  that  a  process  can  take  precedence  over  a 
higher  priority  process  simply  because  it  is  ahead  of  the  process  in  the  queue. 
Some  systems  make  extensive  use  of  special  priority  queues  to  overcome 
this  whereas  in  some  of  the  systems  we  will  examine  queues  are  eliminated 
altogether. 

Process  synchronization  itself  is  problematic.  When  a  process  waits  for 
another  process  it,  in  a  sense,  gives  up  control  over  its  own  progress  toward 
completion.  The  designer  of  a  process  has  to  have  firm  bounds  on  all  of 
the  synchronization  times  in  order  to  ensure  that  the  process  will  meet  its 
deadlines.  Queues  and  synchronization  form  the  backbone  of  inter-process 
communication  in  most  systems.  It  is  easy  to  see  that  inter-process  commu- 
nication will  be  affected  by  the  constraints  imposed  on  real-time  systems. 

This  thesis  examines  these  problems,  with  particular  emphasis  on  data 
communications,  in  the  restricted  domain  of  servo  loops.  Servo  loops  form 
the  lowest  levels  of  most  robot  control  system.  The  generic  form  of  a  servo 
loop  consists  of  a  periodically  scheduled  process  which  takes  as  its  input  the 
error  in  some  paj-t  of  the  system  (the  position  of  a  joint,  for  instance,  or  the 
force  applied  by  a  finger)  and  produces  corrections  which  when  applied  to 
the  system's  actuators  will  reduce  the  error.  By  restricting  our  examination 
to  the  domain  of  servo  loops  we  are  given  a  set  of  processes  with  restrictions 
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and  properties  that  facilitate  resolution  of  the  problems  mentioned  above. 
Although  they  form  a  special  set  of  processes,  servo  loops  are  the  foundation 
of  most  robot  control  systems  and  therefore,  worth  the  special  attention. 

The  contribution  of  this  thesis  is  the  development  of  several  tools  for 
developing  servo  systems.  The  Hierarchical  Control  System  (mc)  is  the 
core  of  an  operating  system  for  servo  loops.  It  contains  an  efficient,  though 
not  particularly  new,  scheduler  that  handles  both  periodic  and  asynchronous 
processes.  There  are  severe  restrictions  on  the  processes,  however.  Hic  also 
includes  a  novel  inter-process  communication  structure,  the  periodic  data 
buffer  (or  PDB),  which  is  well  suited  for  use  with  servo  loops.  HiC  has  been 
successfully  implemented  and  is  used  to  control  the  Utah/MIT  hand  [Jaco84] 
in  the  NYU  Robotics  Laboratory.  Ganglia  is  a  proposed  communication 
network  that  would  allow  the  servo  components  of  a  robot  control  system  to 
be  physically  distributed  around  the  robot  being  controlled.  The  intent  is  to 
permit  the  processors  to  be  placed  near  the  devices  they  control  to  reduce  the 
amount  of  data  that  must  be  transmitted  and  to  reduce  the  problems  of  noise 
induced  in  the  signals  between  the  devices  and  the  processors.  Ganglia  uses 
a  combination  of  simple  polling  and  token  passing  to  provide  access  to  the 
communication  medium  that  guarantees  transmission  of  scheduled  messages 
within  their  time  constraints.  It  also  uses  a  unique  message  addressing 
scheme  that  implements  a  network  wide  memory. 

In  the  remaining  sections  of  this  chapter  we  present  a  brief  discussion  of 
servo  loops  and  then  overviews  of  Hic  and  ganglia.  The  next  chapter  is 
a  survey  of  two  areas  related  to  this  thesis,  one  area  is  operating  systems 
for  robot  controllers  and  the  other  is  real-time  communications.  Chapters  3 
and  4  present  HIC;  first,  it  is  presented  conceptuaDy  and  then  some  details 
of  the  implementation  are  presented.  Chapters  5  through  7  present  gan- 
glia. The  architecture  and  protocol  are  discussed  in  chapter  5  and  then  the 
Communication  Memory  Management  Unit  (the  central  component  of  the 
ganglia  architecture)  is  discussed  in  the  next  chapter.  Analysis  of  the  GAN- 
GLIA protocol  using  some  existing  robot  systems  as  models  is  presented  in 
chapter  7.  Finally,  conclusions  and  avenues  for  future  research  are  presented 
in  chapter  8. 

1.1     Servo  Loops 

Servo  loops  have  several  properties  that  will  be  exploited  in  this  thesis. 
Namely  that  they  are  periodic  processes  with  regulcir  schedules,  they  have 
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what  we  will  call  a  "closed  form",  which  will  be  described  below,  and  that 
the  quantities  they  work  with  are  continuous  quantities. 

Servo  loops  can  easily  be  implemented  as  regularly  scheduled  processes. 
The  rate  at  which  they  run  depends  on  the  device  being  controlled  tind  the 
computationaJ  power  of  the  controller.  They  must  be  fast  enough  to  provide 
smooth  operation  of  the  device  and  to  keep  the  device  within  operational 
limits.  On  the  other  hand,  it  is  bounded  by  the  speed  of  the  sensors  and 
actuators  and  the  computational  power  of  the  processor.  What  is  most  im- 
portant in  this  work  though  is  that  they  are  periodic.  Periodic  processes  are 
easily  scheduled.  A  very  simple  algorithm  is  the  rate  monoionic  scheduling 
algorithm.  In  this  algorithm  the  priority  of  a  process  is  proportional  to  its 
rate  and  at  all  times  the  highest  priority  ready  process  is  run.  If  there  axe 
no  restrictions  on  what  process  can  preempt  another  (which  boils  down  to 
meaning  that  there  are  no  mutual-exclusion  interactions  between  the  pro- 
cesses) then  this  algorithm  will  successfully  schedule  any  set  of  processes 
with  processor  utilization  less  than  70%  and  in  practice  it  will  typically 
schedule  sets  with  utilizations  of  85%  to  90%  [Sha86]. 

The  restriction  that  processes  do  not  have  interactions  involving  mu- 
tual exclusion  is  severe,  but  it  is  generally  applicable  to  servo  loops.  Servo 
loop  programs  tend  to  implement  closed  form  control  formulas  which  means 
computationally  that  once  given  their  input  data  they  can  proceed  to  com- 
pletion without  further  interaction.  This  gives  servo  loops  a  common  flow  of 
input  —  compute  —  output  which  is  made  use  of  in  HIC.  Hic's  periodic  data 
buffers  further  enhance  this  dosed  form  by  providing  a  non-synchronizing 
communication  structure. 

The  final  major  property  of  servo  loops  is  that  they  are  concerned  with 
errors  in  continuously  varying  vjJues,  typically  mechanical  measurements 
(joint  position,  force,  etc.).  This  means  that  some  of  the  requirements  usu- 
ally imposed  on  inter-process  data  communication  can  be  relaxed.  Consider 
a  standard  situation  of  two  processes  involved  in  transferring  a  file.  It  is  of 
the  utmost  importance  that  each  piece  of  the  file  transferred  between  the 
processes  is  successfully  transferred  or  the  file  can  not  be  reconstructed  at 
the  destination  end.  Any  lost  piece  of  the  file  invalidates  the  entire  trans- 
fer. Now  consider  two  servo  processes,  one  of  which  produces  input  for  the 
other.  Suppose  that  for  some  reason  that  the  destination  process  is  unable 
to  obtain  the  most  recent  value  from  the  source  process.  Since  the  miss- 
ing value  represents  a  continuously  varying  quantity  it  is  feasible  for  the 
receiving  process  to  use  the  value  from  the  previous  cycle.  Whether  this 
is  a  reasonable  action  depends  on  the  robustness  of  the  control  law  imple- 
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mented,  the  magnitude  of  the  other  sources  of  error,  and  the  rate  of  the 
servo.  But,  within  the  constraints,  the  loss  of  an  occasional  value  is  of  no 
more  concern  than  noise  on  a  sensor.  A  further  requirement  here  is  that 
small  variations  in  the  input  to  the  process  produce  small  variations  in  the 
process's  output.  This  though  would  follow  from  implementing  locally  con- 
tinuous control  laws.  Ganglia  depends  to  a  large  extent  on  the  fact  that  an 
occasional  lost  data  packed  is  acceptable,  since  much  of  traffic  in  a  GANGLIA 
network  is  never  acknowledged.  Pdb's  in  Hic  also  depend  on  this  property. 

1.2    Hic 

Hierarchical  design  is  common  in  computer  prograjns.  The  rejison  is  two- 
fold. The  first  is  for  the  convenience  of  the  users'  and  programmers'.  For 
instance,  an  operating  system  might  present  disk  storage  to  the  programmer 
as  sequential  and  random  access  files,  hiding  the  specific  details  of  disk  access 
and  file  maintenance.  On  top  of  this  are  developed  text  files,  indexed  files, 
relational  databases,  ajid  so  on.  Thus  the  user  or  programmer  is  presented 
with  a  "familiar"  abstraction  suitable  for  the  task  at  hand.  Secondly,  hierar- 
chical progreimming  is  good  software  engineering.  K  one  considers  a  system 
as  having  the  raw  devices  on  the  bottom  and  the  application  programs  on 
the  top  then  hierarchical  programming  represents  vertical  modularity  and 
all  the  various  advantages  of  modular  programming  apply.  The  specification 
for  each  step  in  the  hierarchy  can  be  precisely  stated,  simplifying  develop- 
ment. Changes  in  the  underlying  hardware  need  not  propagate  changes 
throughout  the  system  but  can  be  handled  by  the  lower  levels  of  the  hier- 
circhy.  Programming  modifications  and  optimizations  can  be  applied  at  the 
appropriate  level(s)  again  without  disrupting  the  entire  system. 

All  of  this  applies  to  robot  control  programs  as  well.  Furthermore,  our  ex- 
perience with  a  planar  manipulator,  the  Four  Finger  Manipulator  [Demm88] 
[Hor87],  has  shown  that  there  is  a  similarity  of  structure  in  the  lower  levels 
of  the  control  hierarchy  for  robot  manipulators.  Hic  is  an  operating  system 
meant  to  exploit  the  hierarchical  structure  and  the  similarity  of  the  control 
levels. 

In  addition,  Hic  exploits  the  special  characteristics  of  servo  loops  dis- 
cussed above.  The  primary  characteristic  of  HIC  processes  is  that  they  do 
not  block.  Once  started  they  must  run  to  completion  with  the  possible  ex- 
ception of  being  preempted  by  higher  level  processes.  This  property  of  HIC 
processes  means  that  a  single  stack  can  be  shared  by  all  the  processes  on  a 
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processor.  Switching  contexts  between  servo  processes  in  this  system  is  very 
efficient  since  it  involves  little  more  than  a  subroutine  call.  Hic  [Clar88]  was 
developed  at  NYU  for  our  work  with  the  Utah/MIT  hand.  Hic's  scheduler  is 
derived  from  Condor's  scheduler  (Condor  is  an  operating  system  developed 
at  MIT).  However,  HIC  is  able  to  schedule  asynchronous  processes,  whereas 
Condor  is  not.  Hic  assign  priority  to  periodic  tasks  on  a  rate  monotooic 
ba^is  and  ensures  that  the  highest  priority  task  ready  to  execute  is  executed. 
Inter-process  communication  typically  involves  some  form  of  process  syn- 
chronization and,  therefore,  blocking  on  the  part  of  one  or  both  of  the  pro- 
cesses. As  part  of  HIC,  a  structure  called  Periodic  Data  Buffers  (pdb's) 
was  developed  which  allows  inter-process  and  inter-processor  communica- 
tion without  blocking.  Pdb's  cire  not  general  purpose  inter-process  commu- 
nication structures  but  are  well  suited  for  data  transfer  between  servos.  The 
major  characteristics  of  pdb's  are  that  they  do  not  represent  point-to-point 
communication  but  broadcast  communication,  that  they  are  non-blocking, 
and  that  they  always  return  some  data  when  queried,  namely,  the  last  data 
placed  in  the  PDB  (this  is  true  once  some  data  has  been  placed  in  the  PDB). 
The  last  characteristic  (always  having  some  data  available)  is  similar  to  the 
sticky  messages  used  in  the  GEM  operating  system  [Schw85].  The  seman- 
tics of  pdb's  are  similar  to  the  out  ()  ajid  readO  primitives  of  the  language 
Linda  for  distributed  systems  [GeleSS],  but  the  domain  of  HIC  auid  pdb's  is 
much  smaller  than  that  of  Linda  and  its  communication  primitives. 

1.3    Ganglia 

Robot  control  systems  typically  consist  of  cooperating  micro-computers. 
Frequently  the  processors  use  shared  memory  on  a  common  bus  for  com- 
munication. This  is  particularly  true  when  the  processors  must  cooperate  in 
executing  low  level  tasks,  such  a£,  servo  loops.  This  architecture  hcis  great 
appeal:  construction  of  the  controller  is  easy,  using  off-the-shelf  busses  and 
boards;  the  system  can  be  quite  modular  and  expandable  for  the  same  rea- 
son; shared  memory  is  conceptually  simple  and  flexible  for  the  programmer; 
and  shared  memory  is  a  reliable  and  fast  communication  medium.  Problems 
arise  as  robot  systems  get  more  complex.  There  are  mechanical  and  elec- 
trical limits  to  the  expansion  of  shared  memory  systems  which  put  rather 
small  bounds  on  the  number  of  boards  that  can  be  in  a  system.  Centralized 
processing  in  a  single  box  requires  extensive  cabling  to  the  various  sen- 
sors and  actuators.  This  is  often  awkward,  constraining  the  placement  of  a 
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robot  and  thus  inhibiting  expansion.  To  further  complicate  matters,  it  is  in 
the  nature  of  robot  arms  that  the  most  interesting,  delicate,  and  complex 
component  is  usually  the  end-effector,  which  is  farthest  from  the  central 
controller.  Thus  the  most  numerous  and  most  delicate  signals  must  travel 
the  greatest  distance  subjecting  them  to  both  electrical  and  mechanical  in- 
terference. Complex  sensors  now  in  development,  such  as  tactile  sensing 
arrays,  compound  the  problem  because  of  the  multitude  of  connections  re- 
quired. One  way  to  solve  the  problem  of  cabling  between  sensors,  actuators 
ajid  processors  is  to  put  the  processors  near  to  the  devices.  This,  though, 
poses  a  communication  problem  between  the  various  processors,  ajnong  the 
processors  controlling  the  devices  and  between  these  processors  and  higher- 
level  control  processors.  Ganglia  is  a  robot  controller  architecture  and 
communication  protocol  meant  to  address  these  issues.  Distributing  a  robot 
controller  within  the  robot  presents  many  problems,  for  instance,  those  re- 
lated to  weight,  size,  packaging,  power-distribution,  and  software  tools.  The 
current  research  on  ganglia  is  focused  on  the  communications  problems 
and  only  lightly  touches  on  these  further  topics. 

The  vision  behind  GANGLIA  is  that  one  might  someday  have  components 
the  size  of  a  matchbook  capable  of  the  low  level  control  of  actuators  and/or 
sensors.  This  woiild  include  the  necessary  interface  electronics,  processing 
power,  ajid  communication  electronics.  A  single  cable  would  connect  the 
component  to  the  network  of  components  and  provide  electrical  power  for 
the  component  as  well  as  the  communication  link.  These  components  or 
nodes  would  be  scattered  about  the  robot  as  needed.  Other  nodes,  more 
centrally  located,  would  provide  coordination  and  higher  level  control  and 
communicate  with  still  higher  controllers  (eg.  the  operator  or  a  multi-robot 
controller).  Nodes  are  not  restricted  to  these  matchbook  size  components, 
high  level  control  or  communication  nodes  or  nodes  requiring  intense  compu- 
tation may  be  full-blown  computers.  Each  "independent"  robot,  for  instajice 
an  arm  and  its  end-effectors,  would  have  its  own  network.  On  complex  sys- 
tems one  might  expect  dozens  of  nodes  while  in  some  instances  hundreds 
may  be  required.  Much  needs  to  be  done  before  such  a  vision  is  realized. 
Developing  a  communication  architecture  suitable  for  this  network  is  an 
important  step  in  this  direction. 


CD 


Introduction 


Figure  1.1:  Utah/MIT  hand  and  PUMA  arm. 


1.4     Utah/MIT  Hand 

The  impetus  for  both  Hic  and  GANGLIA  came  while  considering  the  design  of 
a  controller  for  the  Utah/MIT  Hand  [Jaco84],  an  anthropomophic  hand  with 
four  fingers  one  of  which  is  opposed  to  the  other  three  and  acts  as  a  thumb. 
Each  finger  has  four  joints  for  a  total  of  16  degrees  of  freedom  and  each  joint 
has  a  position  sensor  and  a  torque  sensor.  The  pneumatic  cylinders,  two  per 
joint,  which  provide  power  for  the  joints  are  located  in  a  power  pack  located 
roughly  four  feet  from  the  hand  itself.  Tendons  connecting  the  joints  and 
the  cylinders  are  routed  through  a  flexible  linkage.  This  linkage  gives  the 
hand  the  freedom  to  move  about  in  a  cube  of  about  two  feet  per  edge  while 
the  power  pack  remains  stationary.  The  hand  is  controlled  by  a  controller 
consisting  of  several  micro-processors  on  a  common  bus.  an  architecture  very 


lA Utah /MIT  Hand i^^^   9 

common  in  robot  controllers.  Sensory  data  runs  along  cables  connecting  the 
32  tension  sensors  and  16  position  sensors  in  the  hand  to  the  controller  and 
actuator  signals  are  sent  from  the  controller  to  the  32  cylinders  in  the  power 
pack.  The  hand  alone  is  a  complex  robot  by  contemporary  standards  but 
by  itself  it  is  of  very  limited  use.  To  provide  mobility  the  hand  must  be 
attached  to  an  arm  which  adds  several  more  degrees  of  freedom  and  sensing 
to  the  system.  To  use  the  hand  for  dextrous  manipulation  tactile  sensors 
must  be  added  to  finger  tips  [Jaco88]  [Spee88b].  These  add  a  great  deal  of 
complexity  to  the  system.  This  system,  the  Utah/MIT  hand,  an  arm,  and 
tactile  sensors  is  quite  complex  but  it  would  be  short  sighted  to  think  this  is 
in  any  sense  the  ultimate  in  robot  complexity,  even  if  only  the  near  future 
is  considered.  Figure  1.1  is  a  photograph  of  the  hand  and  its  power  pack. 
The  hand  is  attached  to  a  PUMA  560  arm. 


Chapter  2 


CZiCDCDCDCZiC^CDCDCDCDCIi 


Real-Time 
Data 


C/PERATING  SYSTEMS  for  robot  control  systems  fall  within  the  category  of 
real-time  operating  systems.  Perhaps  the  most  salient  feature  of  real-time 
operating  systems  is  preemptive  scheduling  which  means  that  it  is  possible 
for  a  high  priority  task  or  tasks  to  demand  immediate  access  to  the  processor 
so  that  some  real-time  constraint  can  be  met.  A  characteristic  of  low  level 
robot  control,  the  servo  loop,  permits  further  refinement  of  the  operating 
system  to  the  point  that  some  of  systems  discussed  in  this  survey  bear  little 
resemblance  to  normal  operating  systems.  Servo  loops  demand  repetitive 
and  timely  service  and  a  robot  control  system  is  likely  to  have  many  loops. 
Special  scheduling  techniques  can  be  used  because  of  the  repetitive  nature 
of  the  loops.  The  demands  of  timely  service  (particularly  in  high  frequency 
loops  of  low  level  control)  require  a  low  tolerance  for  blocking  of  tasks  for 
indeterminate  lengths  of  time.  As  a  result  one  finds  that  queues  play  a 
diminished  role  in  the  systems  presented  in  this  survey  when  compeired  to 
their  role  in  normal  operating  systems. 

For  the  modern  control  systems  the  architecture  almost  always  consists 
of  several  micro-computers  on  a  shared  bus.  Economics  is  clearly  one  recison 
for  this  but  the  modularity  provided  by  this  architecture  is  also  important. 
In  a  single  computer  environment  any  addition  to  a  system,  a  new  device  or  a 
higher  level  of  control,  degrades  the  performance  of  all  the  other  components 
of  the  system.  For  instance,  an  addition  to  a  high  level  planner  may  starve 
an  otherwise  accurate  joint  servo  process  or  a  critical  safety  procedure.  In 
practice,  with  single  board  computers,  only  the  interaction  with  components 
closely  associated  with  a  new  component  or  feature  need  be  considered. 

Multi-processor  architectures  are  used  in  all  the  systems  discussed  here 
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but  they  tend  to  be  loosely  coupled  systems.  Typically,  processes  are  stat- 
ically assigned  to  processors  and  there  is  no  attempt  to  dynamically  dis- 
tribute the  load  among  the  processors.  As  above,  both  economics  and  sim- 
plicity of  design  account  for  this  characteristic. 

Despite  the  high  computing  power  often  found  in  these  systems  it  is 
appropriate  to  consider  them  as  small  systems.  Although  they  are  typically 
multi-tasking  and  multi-processor  systems  they  are  "single  user"  systems  in 
the  sense  that  the  system  is  dedicated  to  a  single  "job".  So  just  as  a  user 
may  go  through  phases  of  editing,  compiling,  and  executing  a  program,  a 
robot  system  may  go  through  markedly  different  states.  Consider  a  robot 
arm  with  a  force  sensing  tool  at  the  end.  When  the  tool  is  not  in  contact 
with  any  object  the  arm  may  move  freely  using  only  desired  position  to 
control  the  motion,  but  as  the  tool  approaches  an  object  a  different  control 
regime  may  be  used  to  make  contact  with  the  object  and  then  another 
control  regime  may  be  invoked  to  have  the  tool  perform  the  desired  tasks. 
The  entire  system  including  the  operating  system  may  reflect  these  different 
regimes,  different  types  of  processes  may  be  used,  different  methods  of  inter- 
process communication  may  be  available  and  so  on.  This  is  in  contrast  to  a 
time-sharing  operating  system  which  struggles  to  maintain  the  single  state 
of  being  "up"  in  which  all  of  its  functions  are  available.  Another  attribute 
of  smjdlness  is  that  robot  control  systems  tend  to  have  a  single  address 
space  (or  maybe  one  per  processor).  This  reduces  overhead  during  context 
switches  and  simplifies  inter-process  communication. 

2.1     Real-time  Operating  Systems 

This  section  presents  some  existing  robot  control  systems.  The  references 
cited  have  a  range  of  focuses,  from  implementation  of  a  particular  robot 
system,  to  presentation  of  a  robot  programming  language,  to  discussion  of 
general  characteristics  of  real-time  systems.  For  our  purposes,  the  following 
three  aspects  are  important:  1)  the  computational  architecture  proposed  or 
required  by  the  system  (there  is  little  variation  here),  2)  the  nature  of  tasks 
or  processes  in  the  system  and  the  means  of  inter-process  communication, 
and  3)  programming  style  implied  or  required  by  the  authors. 

2.1.1     AL 

AL  [Fink74]  is  a  language  with  embedded  operating  system  for  programming 
robots  in  assembly  tasks.   It  was  developed  during  the  mid  1970's,  before 
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the  advent  of  micro-computers  so  the  restrictions  on  real-time  computing 
power  that  so  profoundly  mark  AL  are  greatly  reduced  today.  However  AL 
is  Interesting  because  of  the  tremendous  scope  addressed  by  the  language 
and  the  techniques  used  to  overcome  the  limitations  in  computing  power. 

AL  was  developed  in  an  environment  where  there  was  very  little  run-time 
computing  power  compared  to  today's  systems.  Much  more  was  expected 
of  the  off-line  compiler.  For  example,  before  a  move  command  can  be  com- 
piled the  compiler  must  know,  at  compile  time,  the  positions  the  arms  are 
expected  to  have  at  run  time!  One  consequence  of  this  is  that  there  are 
no  run  time  subroutines.  A  subroutine  can  expect  to  be  called  in  various 
circumstances  so  the  compiler  when  compiling  a  subroutine  has  no  expected 
run-time  values  for  the  subroutines  variables  or  parameters.  Libraries  of 
routines  are  implemented  but  are  compiled  as  macro  expansions. 

The  AL  run  time  system  consists  of  five  types  of  processes  distinguished 
by  function  and  priority.  In  order  of  their  priority  (high  to  low)  they  are: 

•  Joint  servos  and  encoder  input 

•  Clock,  calendar  (scheduler) 

•  Condition  monitors 

•  Servo  predictors 

•  Interpreters 

AL  divides  time  into  equal  length  slots,  1  msec  in  length.  In  one  slot  at 
most  one  of  the  joint  servo  or  encoder  sensing  processes  is  scheduled.  This 
process  is  given  the  first  opportunity  during  the  slot.  Lower  level  processes 
then  compete  for  the  remainder  of  the  slot. 

Each  joint  servo  process  is  paired  with  a  lower  priority  servo  predictor 
process.  When  started  a  joint  servo  process  drives  a  single  actuator  and 
then  hands  control  to  the  corresponding  predictor  process.  The  predictor 
first  schedules  the  next  execution  of  the  servo  process  by  consulting  the 
calendar.  Now  knowing  the  time  the  servo  process  will  next  run  and  the 
current  position  and  velocity  of  the  joint  the  predictor  uses  the  trajectory 
polynomial  to  predict  the  state  of  the  joint  when  the  servo  process  is  next 
scheduled,  thus  it  can  determine  the  action  to  be  taken  by  the  servo  process. 
When  the  reserved  slot  arrives  the  servo  process  uses  the  planned  values, 
with  slight  modification  baised  on  more  recent  information,  to  drive  the 
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actuator.  When  a  motion  is  complete  or  aborted  the  servo  process  and 
predictor  process  remove  themselves  from  the  system. 

Condition  monitors  are  processes  that  monitor  various  parts  of  the  sys- 
tem, for  example  the  length  of  time  taken  by  a  move,  sensed  forces  or  touch 
pad  sensors.  When  a  condition  monitor  is  tripped  it  initiates  actions  spec- 
ified by  the  compiled  code.  The  action  may  be  critical  in  which  case  it 
is  immediately  executed  or  it  may  simply  be  scheduled  along  with  other 
processes. 

Interpreters  are  the  lowest  level  processes.  These  provide  the  higher  level 
run  time  control.  They  start  moves,  enable  and  disable  monitors,  perform 
calculations  and  spawn  new  processes.  The  interpreted  processes  interpret 
code  for  a  virtual  stack  machine.  Presumably,  the  reason  for  this  is  that  it  is 
easy  to  implement  re-entrant  procedures  this  way,  the  context  for  processes 
can  be  controlled  to  minimize  context  switch  times,  and  code  generation  for 
the  compiler  is  simplified. 

As  mentioned  previously,  all  trajectories  are  planned  by  the  compiler 
based  on  assumed  values  for  the  various  arm  variables.  At  run  time  the 
real  values  will  not,  in  general,  match  the  values  tissumed  by  the  compiler. 
Before  a  move  is  started  the  real  state  of  the  system  is  compared  to  the  as- 
sumed state,  if  the  difference  is  smcill  then  small  adjustments  to  the  planned 
trajectory  can  be  made  that  preserve  the  desired  characteristics  of  the  move. 
K,  on  the  other  hand,  the  difference  is  large  then  a  more  brute  force  method 
is  used  to  bring  the  robot  arm  to  the  desired  configuration. 

Lazy  evaluation  is  used  on  the  run  time  variables.  Each  variable  has  two 
cells  associated  with  it,  a  value  cell  and  a  node  cell.  The  value  cell  holds  the 
variables  current  value  (eg.  a  scalar,  a  vector,  a  plane,  or  a  frame).  The  node 
cell  holds  information  about  the  freshness  of  the  value  and  how  to  calculate 
a  fresh  value,  if  needed.  This  calculation  may  require  that  other  variables 
be  updated  Jind  so  on  until  the  required  fresh  value  available.  The  node 
cell  points  to  list  of  procedures  that  will  bring  the  variable  up  to  date.  The 
node  cell  also  points  to  a  list  of  variables  that  depend  on  this  variable,  i.e. 
the  variables  that  must  be  marked  invalid  if  this  variable  becomes  invalid. 
These  nodes  thus  form  a  graph  that  represents  the  state  of  the  run  time 
variables. 

2.1.2    NRTX 

The  New  Real-Time  Executive  (NRTX)  [Kapi84]  is  a  real-time  control  sys- 
tem developed  at  Bell  Laboratories.  The  strategy  of  the  developers  was  to 
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take  an  existing  time-sharing  system,  UNIX,  remove  unnecessary  and  time 
consuming  features  (eg.  file  system,  multi-user  support)  and  add  real-time 
support  features,  mainly  additional  interprocess  communication  features). 
The  appeal  is  that  the  programmer  hcis  a  familiar  and  presumably  comfort- 
able system  interface.  Also  its  possible  to  take  advantage  of  existing  libraries 
and  tools  in  some  cases. 

Processes  in  NRTX  are  regular  UNIX  processes  but  with  a  single  address 
space  (in  UNIX  a  spawned  process  gets  a  copy  of  the  parent's  variables). 
Four  methods  of  interprocess  communication  are  supported. 

•  Shared  memory  in  which  processes  simply  refer  to  the  same  location. 

•  Signals  which  are  a  traditional  UNIX  form  of  inter-process  signaling. 
The  model  for  signals  is  hardware  interrupts.  A  process  sets  up  vec- 
tors of  signal  hajidlers  (much  like  hardware  interrupt  vectors)  and  then 
proceeds  with  normal  processing.  When  a  signal  is  received  from  an- 
other process  the  appropriate  signal  handler  routine  is  invoked  as  an 
asynchronous  subroutine  call. 

•  Traditional  semaphores  are  also  available.  Cooperating  processes  use 
one  of  a  pool  of  semaphores  and  the  shared  resource  can  be  claimed 
and  released  using  P  and  V  operations. 

•  Processes  are  also  able  to  pass  messages.  Messages  are  fixed  size  con- 
sisting of  a  message  type,  message  length  and  a  pointer  to  a  buffer 
or  structure.  Each  process  has  a  unique  queue  Jissociated  with  it  for 
messages.  Usually,  transmitting  a  message  is  quite  rapid  since  it  is 
merely  a  matter  of  copying  the  short  message  and  appending  it  to  the 
receiver's  queue.  There  are  restrictions,  however,  on  the  number  of 
messages  any  one  processing  can  have  queued  and  on  the  total  num- 
ber of  messages  in  the  system  so  it  is  possible  that  a  message  can 
not  be  immediately  transmitted.  In  this  case  the  sending  process  has 
the  option  of  blocking  until  the  message  is  successfully  transmitted 
or  returning  immediately  and  canceling  the  request  to  send.  When 
receiving  a  message  a  process  specifies  a  range  of  message  types  it 
will  accept  at  this  time  and  can  optionally  wait  until  an  appropriate 
message  is  received. 

NRTX  essentially  makes  no  statement  about  how  real-time  control  pro- 
grams should  be  written.  The  UNIX  environment  aind  model  are  presented 
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to  the  programmer  with  some  enhancements  and  the  programmer  is  free  to 
choose  the  appropriate  style  and  tools. 

2.1.3     Harmony 

Harmony  [Gent84]  [GentSl]  [Boot82]  is  a  real-time  system  developed  at  the 
University  of  Waterloo.  It  is  a  derivative  of  Thoth  [Cher82]  a  general  purpose 
operating  system. 

The  paradigm  used  for  interprocess  communication  is  message  passing. 
Four  functions  are  used  to  implement  the  communication.  Each  function 
returns  a  process  identifier  or  error  indicator. 

•  ^endCrequest ,  reply,  id).  Request  points  to  a  variable  length 
message  to  be  passed  to  the  process  indicated  by  id  (the  length  is 
in  the  first  two  bytes  of  the  message).  Reply  points  to  a  buffer  to 
receive  the  reply  from  the  receiving  process  (again  variable  length). 
The  sending  process  blocks  until  the  receiving  process  explicitly  replys 
to  the  request. 

•  -Receive (request,  id).  If  the  id  is  non-zero  the  receiving  process 
blocks  until  there  is  a  message  from  the  indicated  process,  if  id  is 
zero  any  message  will  be  received  and  the  process  will  block  only  if 
no  messages  are  pending.  The  incoming  message  is  copied  into  the 
area  pointed  to  by  request,  if  the  incoming  message  is  too  long  it  is 
truncated.  The  identifier  of  the  sending  process  is  returned. 

•  -Try -receive (request,  id).  This  is  the  same  as  -Receive  except 
that  the  process  never  blocks.  If  no  message  is  available  then  a  zero 
value  is  returned. 

•  -Reply (reply,  id).  The  process  indicated  by  id  must  be  blocked 
waiting  for  a  reply  from  this  process  (i.e.  it  must  have  performed  a 
-Send).  If  so  the  message  pointed  to  by  reply  is  copied  into  the  reply 
area  specified  by  the  sending  process. 

Three  points  are  important  here.  One  is  where  processes  block.  The 
sender  blocks  until  the  receiver  explicitly  replies.  The  receiver  on  the  other 
hand  does  not  block  during  a  -Reply  since  the  sender  has  provided  a  buffer. 
In  fact  the  receiver  only  blocks  when  there  are  no  messages  (i.e.  when  there 
is  nothing  to  do,  as  we  shall  see).  Second,  replys  need  not  be  made  in 
any  particular  order.  Third,  all  buffering  is  handled  by  the  processes.  This 
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eliminates  the  overhead  of  system  buffer  management  and  more  importantly 
means  that  there  is  no  hidden  blocking  while  waiting  for  a  buffer.  Note  that 
if  the  processes  are  in  different  address  spaces  some  sort  of  system  buffering 
most  likely  takes  place.  Gentleman  mentions  the  problem  but  does  not 
address  it,  however  one  can  imagine  a  scheme  in  which  the  request  message 
is  not  copied  into  the  receiver's  address  space  until  the  receiver  is  ready  for 
it.  This  is  as  reliable  as  the  underlying  medium.  Note  also  that  the  receiver 
will  block  during  the  communication  delays  but  what's  important  is  that 
this  blocking  will  not  lead  to  deadlock,  if  the  medium  is  reliable. 

2.1.4     SAGE 

SAGE  is  an  operating  system  for  real-time  supervisory  control  developed  at 
NYU  [Salk88]  [Salk89].  SAGE  is  intended  to  integrate  and  control  relatively 
autonomous  robotics  systems.  As  a  consequence,  SAGE  is  more  concerned 
with  the  higher  level  procedures  than  the  servo  loops.  SAGE  is  much  more 
a  full-fledged  operating  system  than  any  of  the  others  described  in  this  sec- 
tion. Like  Harmony  and  NRTX,  it  supports  normal  independent  processes. 
Unlike  Harmony  it  supports  a  wide  range  of  inter-process  communication 
schemes.  Unlike  any  of  the  other  systems  SAGE  supports  memory  manage- 
ment facilities  which  are  usually  considered  too  expensive  to  support  in  a 
real-time  environment. 

SAGE  supports  normal  (full-weight)  processes  and  also  supports  a  spe- 
cial type  of  light-weight  process  for  handling  interrupts.  A  problem  pre- 
sented by  hardware  interrupts  is  that  they  are  outside  the  process  priority 
scheme.  A  hardware  interrupt  supercedes  even  the  highest  priority  normal 
process  even  though  some  of  its  tasks  are  actujJly  of  low  priority.  SAGE 
implements  callouts  which  are  processes  with  restricted  functionality  but 
have  low  overhead  and  can  be  scheduled  by  interrupt  procedures.  Thus  a 
hardware  interrupt  can  perform  its  most  critical  functions  immediately  and 
then  readies  a  callout  to  perform  the  lower  priority  tasks.  The  callout  then 
competes  for  the  processor  based  on  its  priority. 

In  SAGE  each  process  has  its  own  address  space  which  can  be  expanded 
dynamically.  The  separate  address  spaces  are  enforced  by  a  memory  man- 
agement unit.  Separate  address  spaces  facilitates  debugging  since  the  pro- 
cesses aie  isolated  and  an  errant  pointer  will  be  more  quickly  detected. 

SAGE  supports  two  low  level  synchronization  primitives  which  are  in- 
tended as  bases  on  which  to  build  more  sophisticated  operations.  The  two 
are  a  typical  mutual  exclusion  locking  mechanism  and  condition  variables 
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which  are  intended  to  support  monitor-like  constructs.  What  is  interesting 
about  these  primitive  operations  is  that  they  are  implemented  in  user  level 
code.  Thus  in  the  case  where  there  is  no  conflict  for  a  resource  (in  many 
situations,  the  most  common  case)  there  are  no  system  calls  involved  in 
obtaining  or  releasing  the  resource. 

2.1.5     OWL 

In  his  thesis  [Donn84]  Marc  Donner  pursues  two  themes.  One  is  the  decom- 
position of  walking  (particularly  insect  walking)  into  loosely  coupled  simple 
sub-tasks.  The  other  is  the  real-time  programming  language  OWL  which 
is  well  suited  for  specifying  this  sort  of  decomposition.  To  demonstrate  the 
effectiveness  of  OWL,  Donner  wrote  a  program  to  control  the  SSA  (Suther- 
land, Sproull,  and  Associates)  walking  machine.  The  SSA  walking  machine 
is  a  six  legged  machine  large  enough  for  a  human  operator  to  ride  upon. 

OWL  is  designed  to  control  the  walking  of  an  insect  like  mechanism.  As 
in  insects,  control  is  distributed  and  loosely  coupled.  Further,  since  only 
static  stability  is  desired,  the  real-time  constraints  are  not  as  oppressive  as 
they  would  be  for  the  dynamic  stability  of  a  hopping  or  running  machine. 

In  OWL  the  primitive  executable  unit  is  the  process.  Every  statement  is 
a  process.  In  the  run  time  system  processes  are  cheap  so  the  overhead  is  not 
great.  Donner  estimates  that  as  implemented  the  cost  of  starting  a  process 
is  about  8  times  the  cost  of  invoking  a  C  function.  A  process  interacts  with 
other  system  components  in  four  ways: 

•  It  can  become  active,  i.e.  compete  for  the  processor. 

•  It  can  terminate,  which  may  allow  other  processes  to  become  active. 

•  It  can  assert  an  alert  signal.  This  will  cause  certain  other  processes 
to  terminate.  This  can  only  be  used  once.  This  is  the  only  system 
supported  means  of  inter-process  communication. 

•  It  can  cause  side  effects,  i.e.  change  globed  variables,  read  sensors,  or 
effect  actuators.  In  the  walking  program  this  is  the  method  used  for 
a  leg  to  communicate  with  its  neighbors. 

Processes  can  be  combined  in  two  ways  into  more  complex  processes.  A 
sequence  is  a  list  of  processes  that  are  performed  in  the  order  in  which  they 
are  coded,  each  waiting  until  the  previous  has  completed  before  starting. 
Each  sequence  is  a  loop  which  is  terminated  either  by  an  explicit  call  to  a 
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termination  process  or  by  an  alert  signal  from  another  process.  The  other 
way  of  combining  processes  is  a  concurrence.  A  concurrence  is  a  list  of 
processes  performed  concurrently.  The  concurrence  terminates  when  all  its 
sub-processes  have  terminated.  When  a  process  asserts  its  alert  signal  all 
other  processes  in  the  same  concurrence  axe  terminated.  Thus  a  process  can 
only  signal  processes  included  at  compile  time  in  the  same  concurrence. 

OWL  can  only  exist  in  an  environment  in  which  processes  are  very  cheap. 
For  instance,  no  case  statement  exists  but  the  effect  is  achieved  by  a  concur- 
rence of  processes,  each  representing  a  branch  of  the  ca^e  with  each  inappro- 
priate process  (branch)  immediately  terminating  itself.  Although  conceptu- 
ally OWL  works  this  way  a  smart  compiler  could  improve  the  resulting  code 
by  actually  implementing  a  case  statement  when  the  semantics  permit. 

2.1.6     Condor 

Condor  [Narti86]  [Sieg85]  is  a  system  developed  to  control  the  Utah/MIT 
hand  [Jaco84]. 

Condor  does  not  support  processes  in  the  normal  sense.  The  basic  unit 
of  execution  is  the  routine.  A  routine  can  be  performed  asynchronously, 
usually  at  the  request  of  an  external  processor.  However,  procedures  aire 
generally  scheduled  for  repetitive  execution.  Thus,  the  flavor  of  Condor  is 
of  multiple  loops  which  are,  from  the  systems  point  of  view,  independent  of 
each  other.  Each  loop  consists  of  a  rate  and  a  procedure  to  be  invoked.  For 
a  loop  to  run  at  50  hz,  the  scheduler  promises  that  the  procedure  will  be 
executed  once  every  20  msec  but  there  is  no  guarantee  that  the  procedure  will 
be  be  scheduled  at  exactly  20  msec  intervals.  In  this  instance,  the  interval 
between  invocations  could  be  almost  40  msec  in  one  instance  and  nearly  zero 
the  next.  The  priority  for  execution  is  determined  by  the  servo  rate,  thus 
higher  frequency  loops  are  given  preference  and  can  preempt  slower  servos 
{rate  monotonic  scheduling).  Because  the  servo  loops  are  not  coroutines  they 
can  use  the  same  stack,  in  fact,  only  one  stack  is  used  for  each  processor. 

The  servo  loop  scheduler  or  SLS  mciintains  a  list  of  servos  in  a  process 
table  in  priority  order.  To  find  a  servo  procedure  to  invoke  the  SLS  scans 
the  process  table  and  invokes  the  first  marked  procedure.  A  procedure  is 
scheduled  by  marking  its  entry  in  the  table.  Thus  for  the  50  hz  example 
there  is  a  procedure  that  every  20  msec  marks  the  entry  for  the  servo's 
procedure  and  then  the  SLS  will  in  a  timely  manner  discover  the  mark  and 
invoke  the  procedure. 

The  primary  tool  for  communication  between  the  loops  is  messages.  Mes- 
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sages  are  sent  via  "mail-box  interrupts".  The  source  loop  interrupts  the  des- 
tination processor  by  writing  to  a  special  location  in  the  processor's  address 
space.  Along  with  the  interrupt  the  source  routine  specifies  a  routine  to  be 
run  on  the  destination  processor  ajid  an  argument  to  pass  to  the  routine. 
The  source  routine  can  then  continue  or  wait  for  completion  of  the  initiated 
routine. 

2.1.7     NYMPH 

NYMPH  [Chen86]  is  a  control  system  developed  at  Stanford  and  used  to 
control  the  Stanford/ JPL  hand  [Maso85].  The  hand  has  three  fingers  with 
four  tendons  per  finger  and  a  tension  sensor  on  each  tendon. 

NYMPH  provides  no  operating  system  for  the  32016  processors;  instead 
two  libraries  are  provided.  One  library  contains  routines  for  message  pjiss- 
ing  between  the  the  32016's  and  the  V  system,  which  runs  on  the  host 
computer  [Cher83].  The  second  contains  synchronization  primitives.  The  V 
system  acts  strictly  as  a  server.  It  cannot  initiate  communication  with  the 
client  machines.  To  communicate  with  the  server,  a  client  machine  forms 
a  message,  passes  it  to  the  server  via  shared  memory  and  then  interrupts 
the  server  which  examines  the  message  and  invokes  the  appropriate  han- 
dler. The  client  processor  waits  in  a  busy  wait  loop  for  the  message  to  be 
returned.  This  is  very  similar  to  Condor's  mail  box  scheme. 

The  synchronization  primitives  consist  of  two  routines  called,  synch- 
signal  (n)  and  sjrnch Jiait (n ,  patience).  To  participate  in  a  synchro- 
nized event  a  processor  calls  sjrnch-signaKn)  where  n  indicates  the  event. 
The  processor  is  then  obligated  to  later  perform  s]rnch_wait  (n ,  patience) 
on  the  same  event.  Typically,  when  S3rnchjrait  is  called  the  processor  blocks 
until  all  processors  participating  in  the  event  have  invoked  synch.wait.  The 
second  argument,  patience,  modifies  the  blocking.  If  patience  is  zero  the 
processor  does  not  block,  if  it  is  greater  than  zero  the  processor  will  block 
but  will  timeout  after  an  interval  proportional  to  patience.  If  patience  is 
less  than  zero  the  processor  blocks  until  synchronization  is  complete. 

NYMPH  is  truly  minimal,  providing  almost  none  of  the  "normal"  operat- 
ing system  services.  The  V  system  loads  "raw"  programs  into  the  processors 
and  turns  them  on  with  only  the  slim  connections  of  messages  and  synchro- 
nization. There  is  no  support  to  help  the  user  to  distribute  the  processing 
load  among  the  processors.  The  authors  hint  that  a  multi-tasking  oper- 
ating system  could  be  put  on  top  of  NYMPH  and  one  suspects  that  any 
significant  development  effort  would  require  the  implementation  of  a  more 
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complete  operating  system  on  top  of  NYMPH. 

2.2  Suminajy 

Two  trends  are  discernible  in  the  systems  presented  here.  Two  of  the  sys- 
tems, NRTX  and  Harmony,  provide  a  standard  multitasking  environment. 
The  systems  are  modified  for  real-time  use  but  each  task  is  presented  a  vir- 
tual machine  and  operates  asynchronously.  The  remaining  systems  discard 
conventionaJ  features  and  impose  constraints  as  needed  to  enhance  the  real- 
time response  of  the  system.  NYMPH  is  the  most  extreme  example  of  what 
might  be  called  "veneer  operating  systems"  that  provide  only  a  thin  layer 
between  the  user  and  the  raw  machine  (Condor  also  falls  in  this  class).  In 
part,  this  latter  trend  is  driven  by  definitive  features  of  robot  control,  most 
prominently  the  servo  loop.  But  more  important  is  the  attempt  to  deliver  as 
much  of  the  raw  power  of  the  underlying  architecture  to  the  control  of  the 
robot  as  possible.  Just  as  time-sharing  systems  have  provided  more  services 
to  the  users  as  the  hardware  technology  hcis  provided  more  computing  power 
it  is  reasonable  to  expect  that  robot  control  system  wiU  grow  in  sophistica- 
tion with  the  underlying  technology.  But  robotics  is  a  young  field  and  the 
complexity  of  the  devices  to  be  controlled  is  growing  rapidly  so  the  need  for 
minimal  operating  systems  will  persist,  at  least  for  the  near  future. 

2.3  Real-Time  Networks 

The  literature  on  computer  communications  networks  is  very  rich  yet  rel- 
atively little  is  written  about  real-time  communication  or,  as  it  is  often 
called,  time-constrained  communication.  The  situation  is  very  much  the 
same  as  in  the  area  of  operating  systems  where  the  traditional  measures  of 
fairness,  completeness  (i.e.  that  each  operation  is  completed  as  if  performed 
serially),  and  throughput  are  sometimes  in  conflict  with  the  demands  of  real- 
time systems  for  predictability  and  timeliness.  In  the  traditional  computer 
communication  literature  average  time  delay  (the  time  between  a  messages 
creation  and  its  delivery)  is  the  dominant  measure  of  a  networks  effectiveness 
and  there  is  a  trade-off  between  the  networks  throughput  and  the  average 
time  delay.  In  time-constrained  communication,  a  message  must  be  deliv- 
ered prior  to  some  deadline  or  it  is  considered  lost  regardless  of  whether  or 
not  it  is  actually  transmitted  or  received.  In  this  context  it  is  more  useful 
to  measure  the  percent  of  messages  lost  (essentially  a  measure  of  the  worst 
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case  time  delay)  than  to  measure  the  average  time  delay.  So  in  a  time- 
constrained  network  the  trade-off  is  between  throughput  and  message  loss 
[Kuro84b]. 

Another  difference  between  traditional  computer  communication  and 
time  constrained  communication  is  reliability.  Traditional  computer  com- 
munication protocols  typically  provide  100%  reliability,  that  is  that  in  a 
conversation  between  two  processes  all  messages  are  delivered  and  delivered 
in  order.  Data-gram  protocols,  which  promise  merely  to  make  an  effort  to 
deliver  a  message,  do  exist  and  are  widely  used  but  they  assume  that  reli- 
ability is  ensured  by  some  higher  level  protocol.  It  is  a  rare  application  in 
a  time-sharing  system  that  does  not  depend  on  reliable  communication.  In 
time-communication  such  as  that  between  servo  loops  in  a  robot  controller, 
however,  some  message  loss  is  tolerable.  So,  it  is  feasible  to  trade  off  reli- 
ability for  lower  delay  in  message  delivery.  The  reason  that  it  is  possible 
to  trade  off  reliability  for  delay  is  that  ensuring  reliable  delivery  increases 
the  demand  on  the  network  because  messages  must  be  acknowledged  (often 
requiring  a  separate  message)  and  because  lost  messages  must  be  retrans- 
mitted. Frequently,  time-constrained  protocols  maJce  no  effort  to  ensure 
delivery  of  a  message  beyond  ensuring  that  it  is  successfully  transmitted 
[Kuro84b]  [Kuro84a]  [Zhao87]  [Fine86],  this  is  also  true  of  GANGLIA.  It 
should  be  noted  that  the  reason  it  is  possible  to  ignore  reliability  is  that  the 
underlying  medium  (typically  a  local  area  broadcast  medium)  is  inherently 
very  reliable  [Gons83]. 

2.3.1     Integrated  Service  Digital  Networks 

An  area  in  the  broader  area  of  electronic  communications  that  is  relevant  is 
integrated  service  digital  networks  (ISDN).  With  the  advent  of  digital  com- 
puter networks,  telephone  companies  and  agencies  throughout  the  world 
have  become  interested  in  providing  both  traditional  telephone  service  and 
data  communication  with  a  single  network  [Rutk82].  This  interest  has  gen- 
erated a  great  deal  of  research  in  integrated  service  digital  networks. 

Electronic  networks  are  divided  into  three  broad  classes,  circuit  switched, 
message  switched,  and  packet  switched  [TaneSl]  [McNa78].  Consider  a  tra- 
ditional local  telephone  call.  The  act  of  dialing  the  call  established  an  elec- 
tronic circuit  between  the  two  parties.  This  circuit  is  maintained  for  the 
duration  of  the  call  and  the  resources  used  for  the  circuit  (relays,  wires, 
etc.)  cannot  be  used  by  another  conversation  until  the  original  call  is  com- 
pleted and  the  resources  are  reclaimed.    This  is  an  example  of  a  circuit 
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switched  network. 

The  second  type  of  network,  message  switching,  can  be  typified  by  the 
telegraph  network.  A  telegraph  may  not  be  transmitted  directly  from  the 
source  office  to  destination  office  but  may  be  sent  to  an  intermediate  office 
which  may  hold  the  message  until  a  convenient  time  when  it  will  be  trans- 
mitted to  the  next  office.  This  "store  and  forward"  sequence  is  repeated  until 
the  message  reaches  the  destination  office.  Network  resources  are  reclaimed 
after  each  message  is  successfully  transmitted. 

When  a  data  file  is  transferred  between  computers  in  a  packet  switched 
network,  the  file  is  divided  into  equal  size  packets  which  are  put  on  the 
network,  individually.  In  a  complicated  network  such  as  the  ARPA  network 
[Tane81],  which  is  a  store  an  forward  network,  the  packets  will  not  necessar- 
ily travel  by  the  same  path  between  the  source  and  destination  computers 
or  hosts.  The  packets  may  not  arrive  at  the  destination  in  the  proper  order 
cind  some  of  the  packets  may  be  lost  entirely  and  have  to  be  re-transmitted. 
It  is  the  responsibility  of  communication  software  on  the  respective  hosts  to 
ensure  that  the  file  is  eventually  received  by  the  destination  host,  complete 
and  in  the  proper  order.  The  network  sees  only  the  packets  and  is  only 
minimally  aware,  if  at  all,  how  the  packets  are  related  to  each  other.  In 
particular,  the  resources  of  the  network  (transmission  media,  buffers  etc.) 
can  be  reclaimed  and  re-used  at  the  end  of  each  packet  regardless  of  its 
relationship  to  other  packets  of  the  same  conversation. 

The  difference  between  message  switching  and  packet  switching  is  that 
in  a  message  switched  network  the  intermediate  hosts  (or  offices)  must  be 
able  to  store  and  retransmit  an  arbitrarily  long  message  whereas  the  packets 
in  a  packet  switched  network  are  bounded  in  size.  This  simplifies  resource 
(particularly  buffer)  management  in  the  hosts.  On  the  other  hand,  in  a 
packet  switched  network  the  receiving  host  must  be  able  to  reconstruct  the 
original  "message"  from  the  packets.  Local  area  networks  such  as  the  Eth- 
ernet [Stal87a]  are  usually  not  store  and  forward  networks  since  each  host 
can  transmit  directly  to  all  the  other  hosts  but  they  are  packet  switching 
networks  since  it  is  undesirable  to  dedicate  the  transmission  medium  to  a 
single  message  for  an  arbitrarily  long  time. 

A  first  attempt  to  integrate  voice  traffic  and  data  traffic  might  be  to 
digitize  the  analog  voice  signals,  form  packets  of  the  digital  data,  transmit 
them  along  with  the  data  packets,  joid  reconstruct  the  analog  signal  at 
the  receiving  end.  Except  for  the  conversions  between  analog  signals  and 
digital  data,  this  is  just  what's  done  with  a  file  transferred  between  two 
computers.    The  problem  is  that  the  reconstruction  of  the  voice  signaJ  is 
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Figure  2.1:  ISDN  frame. 


very  time  sensitive  process  since  a  major  component  of  speech  is  the  sound 
frequencies.  The  packets  arriving  at  the  receiving  end  must  arrive  with  great 
regularity.  This  is  in  contrast  to  the  reconstruction  of  a  data  file  in  which 
the  inter-arrival  rate  of  the  individual  packets  is  of  no  consequence.  There 
axe,  of  course,  time  constraints  on  the  file  transfer,  for  instance  the  transfer 
must  be  completed  before  the  user  gives  up  hope  and  most  transfer  protocols 
have  internal  time-out  limits,  but  the  algorithm  for  reconstructing  a  file  does 
not  depend  on  the  inter-arrival  times  of  the  packets,  while  the  algorithm  for 
reconstructing  speech  depends,  fundamentally,  on  the  inter-arrival  times  of 
the  data  packets. 

A  common  approach  for  ISDN  networks  is  that  shown  in  figure  2.1. 
[Behr84]  [Fine86]  [Hila84]  [Jana84]  [Konh84]  [Li84]  [SudaS4]  [Will84]  [Wong84]| 
[Goel85]  [Schw87]  Time  is  divided  in  to  equal  length  frcimes  and  each  frame 
is  divided  into  two  sub-frames,  a  voice  frame  and  a  data  frame.  The  voice 
frame  is  divided  into  slots  each  one  the  size  of  a  voice  packet.  Voice  packets 
from  a  particular  side  of  a  conversation  occupy  the  same  slot  of  every  voice 
frame.  Thus,  the  packets  arrive  at  the  destination  node  at  regular  intervals. 
Assigning  a  slot  to  a  particular  conversation  is  part  of  the  call  setup  and  the 
method  used  to  assign  slots  is  a  function  of  the  protocol.  Data  packets  can  be 
transmitted  during  the  data  frame  according  to  some  appropriate  protocol. 
On  a  point-to-point  link  such  as  a  micro- wave  link  or  a  heird-wired  computer 
to  computer  link  data  packets  can  be  transmitted  one  after  another  for  the 
duration  of  the  data  frame.  For  multi-access  links  such  as  a  satellite  with 
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multiple  earth  stations  or  a  locai  area  network  using  a  broadcast  bus  access 
to  the  medium  is  determined  by  a  multi-access  protocol. 

The  essence  of  the  scheme  is  that  a  "circuit"  for  a  voice  connection,  or 
other  stream  traffic  requiring  a  circuit,  is  assigned  in  the  form  of  a  dedi- 
cated slot  in  each  frame  while  packet  switching  traffic  use  the  remaining 
bandwidth  on  a  per-packet  basis.  There  are  variations  on  this  bzisic  model. 
Most  ISDN  protocols  of  this  type  allow  the  boundary  between  voice  and 
data  frames  to  move  within  the  larger  frame  so  that  the  voice  traffic  can 
seize  more  of  the  medium's  bandwidth  if  the  amount  of  voice  traffic  is  high 
and  yield  bandwidth  to  the  data  traffic  when  the  number  of  connections  is 
smzdl.  There  is  also  the  problem  of  how  to  allocate  the  slots  in  a  frame.  For 
instajice,  an  ISDN  protocol  for  satellite  links  analyzed  by  Suda  [Suda84]  has 
a  third  subframe  of  small  slots  used  to  reserve  voice  slots.  A  protocol  for 
broadcast  links  by  Goel  and  Elhakeem  [Goel85]  uses  preassigned  frequency 
pilot  tones  to  control  the  size  of  the  voice  frame.  A  designated  node  con- 
trols the  allocation  of  the  bandwidth  between  voice  and  data  by  issuing  the 
appropriate  pilot  tone  at  the  beginning  of  each  frame.  The  node  responsible 
for  controlling  the  network  is  the  one  of  the  voice  nodes  but  the  responsi- 
bility is  rotated  among  the  voice  nodes  over  time.  Fine  and  Tobagi  [Fine86] 
propose  a  protocol  in  which  the  size  of  the  data  subframe  is  constajit  but 
the  size  of  the  voice  subframe  and  consequently  the  frame  itself  grows  as 
new  voice  streams  are  added  which  means  that  there  are  less  frames  per 
second.  To  compensate  for  this  the  individual  voice  packets  are  made  larger 
to  contain  more  voice  samples,  thus  the  bandwidth  dedicated  to  each  con- 
versation remains  constant.  Conversely,  a^  voice  streams  are  removed  from 
the  network  the  voice  packets  are  reduced  in  number  and  size.  Wong  and 
GopaJ  [Wong84]  propose  a  protocol  for  a  token  ring  which  has  a  supervisor 
node  start  a  frame  by  issuing  a  priority  token  which  allows  only  high  priority 
(i.e.  voice)  packets  be  transmitted  and  then  once  the  token  has  made  its 
way  around  the  ring  the  supervisor  issues  a  normal  token  that  allows  data 
packets.  The  sizes  of  the  individual  frames  can  vary  since  the  supervisor  has 
to  wait  for  the  token  before  it  can  start  a  new  frame. 

Researchers  have  noted  that  in  typical  human  speech  60%  of  the  time 
there  is  silence  (Kuro84b].  In  order  to  reclaim  some  of  the  network's  band- 
width some  protocols  transmit  voice  packet  only  during  the  talk  spurts.  An 
extreme  example  of  this  is  a  protocol  proposed  by  Li  and  Majithia,  in  which 
[Li84]  slots  are  assigned  to  voice  conversations  the  data  traffic  competes  on 
a  slot  by  slot  basis  for  the  silent  slots  in  the  conversations.  In  most  of  the 
protocols  the  result  is  that  a  circuit  is  established  on  a  per  talk  spurt  basis 
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instead  of  on  a  per  conversation  basis. 

Gonsalves  implemented  a  voice  packet  protocol  on  an  Ethernet  [Gons83] 
that  coexists  with  the  normal  Ethernet  activity  on  the  medium.  Basically, 
during  the  period  in  which  a  voice  packet  is  contending  for  access  to  the 
link  (i.e.  waiting  for  the  link  to  be  idle,  detecting  collisions,  and  waiting 
the  proscribed  backoff  interval)  the  voice  samples  continue  to  collect  in  the 
packet.  Thus  the  voice  packets  vary  in  size  depending  on  the  amount  of 
time  that  the  packet  was  delayed  because  of  other  traffic  on  the  network. 
Because  the  receiving  node  does  not  know  the  size  of  the  incoming  packets 
in  advance  it  must  buffer  them  and  to  compensate  for  the  buffering  it  must 
delay  the  regeneration  of  the  voice  signal.  Gonsalves,  citing  data  from  AT&T 
[ATT80],  says  that  delays  on  the  order  of  100  milli-seconds  are  acceptable. 
Thus  Gonsalves  is  achieving  low  variance  in  the  playback  of  the  individual 
voice  samples  by  introducing  a  delay  in  the  conversation.  This  delay  allows 
the  buffering  of  samples  to  overcome  the  variation  of  packet  arrival  intervals. 

This  highlights  an  important  difference  between  the  requirements  of  voice 
streams  and  the  data  streams  amongst  servos  in  a  robot  controller.  A  voice 
stream  must  have  low  variance  in  the  inter-arrival  times  betweens  packets 
but  can  tolerate  a  fairly  large  delay  between  the  capturing  of  a  sample  and 
its  regeneration.  Servo  data,  on  the  other  hand,  requires  a  small  delay  as  well 
as  a  small  variance.  Another  significant  problem  presented  by  these  ISDN 
protocols  when  applied  to  the  communications  within  a  robot  controller  is 
that  they  support  only  a  single  class  of  real-time  traffic,  the  voice  traiiic. 
This  mejins  there  is  oidy  a  single  cycle  time  to  handle  and  that  for  the  most 
part  there  is  only  a  single  packet  size.  In  some  of  the  protocols  the  data 
traffic  packets  must  also  be  of  the  same  size.  Often  this  is  just  a  convenience 
for  analysis  but  in  other  cases  it  is  integral  to  the  protocol.  The  real-time 
servo  traffic  expected  in  a  robot  controller  is  likely  to  have  varying  cycle 
times  and  the  servo  stream  packets  are  likely  to  be  of  various  sizes.  The 
packet  switched  traffic  will  be  more-or-less  normal  computer  traffic  with  a 
wide  range  of  packet  sizes.  Also,  there  is  no  notion  of  relative  priority  within 
the  voice  streams  whereas  in  a  robot  controller  the  various  servos  are  likely 
to  have  different  priorities. 

2.3.2     Window  Protocols 

Although  not  as  plentiful  as  research  on  ISDN  there  has  been  some  work  on 
real-time  communication  protocols  that  do  address  the  problems  just  men- 
tioned. An  important  class  of  real-time  protocols,  represented  here  by  two 
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examples,  are  window  protocols.  One  problem  with  traditional  access  pro- 
tocols is  that  all  messages,  or  at  least  all  nodes,  are  considered  equivalent. 
This  "fair"  attitude  tends  to  ensures  that  all  messages  have  equal  access  and 
will  eventually  be  served  and  thus  all  the  jobs  will  have  a  fair  chance  at  suc- 
ceeding. Since  time  is  of  secondary  importance  to  throughput  this  policy  is 
reasonable.  In  a  distributed  real-time  system,  though,  time  is  of  the  essence 
and  we  would  like  to  ensure  that  at  any  one  time  the  "most  important" 
message  is  using  the  network.  To  decide  which  of  the  outstanding  messages 
at  any  given  time  is  the  most  important  and  give  it  access  to  the  network 
requires  some  sort  of  global  scheduling.  In  any  system,  deciding  what  is  the 
most  important  thing  to  do  next  is  not  easy  and  in  a  distributed  system, 
deciding  on  a  global  basis  what  is  most  important  to  do  is  more  difficult. 
Window  protocols  attempt  to  do  this  in  a  distributed  manner  using  a  single 
attribute  of  the  each  message.  It  is  worth  noting  here  that  GANGLIA  uses  a 
central  controller  to  schedule  access  to  the  network  and  actually  does  most 
of  the  scheduling  off-line.  Ganglia,  however,  is  intended  for  use  in  a  re- 
stricted environment  that  makes  this  centralized  control  feasible,  whereas 
the  window  protocols  discussed  here  are  intended  for  a  more  general  class 
of  distributed  real-time  systems. 

Assume  that  we  have  an  Ethernet  like  network.  Each  node  can  detect 
transmission  by  other  nodes  so  that  it  will  not  interfere  with  an  ongoing 
transmission.  In  addition,  each  node  can  detect  when  a  message  it  is  trans- 
mitting collides  with  another  message  and  when  a  collision  is  detected  it 
aborts  transmission.  For  purposes  of  an  example,  assume  that  we  wish  to 
schedule  accesses  on  a  first-come/first-serve  (FCFS)  basis.  That  is,  the  at- 
tribute we  will  use  to  determine  what  message  is  most  important  is  the  time 
the  message  was  generated.  This  is  not  a  very  good  way  to  determine  the 
most  important  message  in  a  real-time  system  but  it  will  suffice  for  this 
example.  Furthermore,  each  node  knows  some  history  of  the  network  activi- 
ties, namely  it  has  a  bound  on  generation  time  of  all  untransmitted  messages 
and  all  of  the  nodes  agree  on  this  bound.  In  other  words,  there  is  a  time, 
(i,  which  all  the  nodes  agree  upon  and  there  are  no  untransmitted  messages 
in  the  system  generated  before  <i.  Finally,  assume  that  each  node  has  a 
clock,  we  will  call  t,  that  is  synchronized  with  the  clock  of  the  other  nodes. 
Maintaining  a  synchronized  clock  on  distributed  nodes  is  a  difficult  problem 
but  we  will  assume  it  for  now  and  see  later  that  it  can  be  relaxed. 

Each  node  performs  the  following  steps: 

1)  Wait  for  the  network  to  be  idle  (i.e.  for  completion  of  the  currently 
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transmitted  message). 

2)  Set  the  variable  tb  =  (<i  +  t)/2  (i.e.  the  mid-point  between  the  oldest 
possible  message  and  the  current  time). 

3)  If  there  is  a  message,  at  this  node,  generated  between  ti  and  tb,  begin 
transmission  of  the  message. 

4)  There  are  three  possibilities,  there  is  a  collision  between  two  or  more 
messages,  a  single  message  is  being  trjinsmitted,  or  the  network  is  idle. 
The  cases  are  hajidled  as  follows: 

—  In  the  case  of  a  collision,  first  abort  the  transmission  from  this 
node,  if  any.  A  collision  means  that  two  or  more  messages  were 
generated  in  the  interval  between  <i  and  tb-  Then  adjust  the 
window  by  setting  tb  =  (<i  +  tb)/2.  Then  if  tb  >  <i  go  to  step 
3.  U  <6  ?^  'i  then  there  are  two  messages  generated  at  the  same 
time,  we  must  resolve  this  problem  by  some  other  means  which 
will  be  discussed  below. 

—  In  the  Ccise  of  an  idle  network,  we  know  that  there  were  no  mes- 
sages generated  between  ti  ajid  <(,  so  we  simply  set  tj  =  tb  and 
proceed  to  step  2. 

—  In  the  case  of  a  single  transmission,  we  know  that  there  was 
only  one  message  generated  between  <i  and  tb  and  it  is  being 
trjinsmitted  so  set  tj  =  tb  and  proceed  to  step  1. 

Basically,  this  example  system  works  by  performing  a  binary  search  in 
the  interval  between  ti  and  t  for  the  oldest  message.  Figure  2.2  shows  how 
the  protocol  works.  The  top  line  shows  a  time  line  (progressing  from  left 
to  right)  with  <i,  <(,,  and  t  marked.  Also  shown  are  the  generation  times 
of  three  message,  rria,  nib,  and  trie.  Both  m,,  and  rub  are  in  the  interval 
[<!,<{,)  so  they  are  both  transmitted  and  collide  with  each  other.  So  the 
interval  is  halved  and  we  try  again  (the  second  line  of  the  figure).  Note  that 
t  has  advanced  in  the  second  line,  indicating  that  time  is  progressing.  In 
the  second  line,  rua  is  alone  in  the  interval  so  it  will  be  trcinsmitted.  After 
transmission  of  nia  the  situation  is  as  shown  in  the  third  line,  <i  has  been 
updated,  t  has  progressed  and  a  new  message,  mj,  was  generated  while  nia 
was  transmitted. 

The  problem  of  two  messages  being  generated  at  the  same  time  must  be 
dealt  with.  One  solution  would  be  to  tack  the  node  number  on  to  the  gen- 
eration time  so  that  no  two  messages  could  have  the  exact  same  generation 
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Figure  2.2:  FCFS  window  protocol. 


time.  This  would  create  a  preference  for  messages  generated  at  the  nodes 
with  low  id  numbers  but  the  preference  would  be  slight  and  probably  not 
disturb  the  overall  performance  of  the  network.  Another  scheme,  proposed 
by  Zhao,  Stankovic,  and  Ramamritham  [Zhao87],  is  to  randomly  select  new 
generation  times  for  the  messages  involved  in  the  collision  and  start  the 
process  again. 

In  this  example  the  initial  interval  was  set  to  half  the  distance  between 
the  oldest  possible  message  and  the  current  time.  The  best  choice  for  the 
size  of  this  interval  depends  on  the  characteristics  of  the  network.  In  general, 
we  would  like  the  size  of  the  initiaJ  interval  to  be  that  size  that  is  most  likely 
to  contain  exactly  one  message,  thus  avoiding  collisions. 

Kurose  [Kuro84a]  proposes  and  analyzes  a  window  protocol  which  uses 
message  generation  time  as  the  metric  for  scheduling  messages.  There  is  a 
time  constraint  that  the  messages  must  meet,  that  is  a  message  generated 
at  time  t  must  be  transmitted  by  time  t  -\-  tc  or  it  loses  all  value.  This  time 
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constrzunt  can  be  used  in  the  protocol  to  improve  performance  since  any 
message  that  has  missed  its  deadline  can  be  discarded  by  its  node  before 
it  is  transmitted,  thus  bajidwidth  is  not  waited  transmitting  these  useless 
messages. 

So  far  we  have  dealt  with  a  single  class  of  time-constrained  data  where 
every  message  has  the  same  time  constraint.  However,  the  protocol  can 
be  extended  to  handle  multiple  classes  of  time-constrained  traffic.  To  do 
this  a  message  is  allowed  to  force  a  lower  priority  message  (which  is  already 
transmitting)  off  the  network  by  jamming  it,  then  the  scheduling  procedure 
begins  with  the  higher  level  traffic.  In  addition,  the  variables  t\  and  tc 
must  be  maintained  for  each  class  of  traffic.  We  will  add  another  level 
of  subscript  for  the  the  class,  thus  ti,  and  i^,  are  the  message  generation 
bound  and  time  constraint  for  message  class  t,  with  i  =  0  for  the  lowest 
class  of  traffic.  The  lowest  priority  traffic  is  the  non-time-constrained,  for 
which  the  time  constraint,  <coi  is  infinity.  While  there  is  no  time-constrained 
messages  to  be  send  the  nodes  with  level  0  messages  contend  for  the  network 
as  described  above.  At  any  time  the  network  is  idle  or  a  class  0  message  is 
being  transmitted  a  node  generating  a  level  1  or  higher  message  can  begin 
transmission.  If  a  level  0  message  is  being  transmitted  the  preempting  mode 
must  transmit  a  jam  signal  to  force  the  level  0  node  to  stop  transmitting. 
After  the  higher  level  message  has  completed  there  may  be  more  messages 
from  that  level  which  must  contend  for  the  network  as  described  above. 
Higher  level  messages  can  preempt  lower  level  messages  in  a  similar  manner. 

In  order  for  this  protocol  to  work  all  the  nodes  must  be  aware  at  all  times 
of  the  global  state,  which  means  each  node  must  maintain  ti,  and  tc,  for  all 
t  and  be  aware  of  which  class  of  traffic  currently  using  the  network.  One 
advantage  of  this  is  that  when  control  of  the  network  returns  to  level  t  after 
it  was  preempted  is  not  necessary  to  go  to  through  the  contention  process 
again.  The  node  that  was  preempted  can  simply  retransmit  the  preempted 
message,  since  all  nodes  are  aware  of  the  state  of  the  level  t  traffic.  Note 
that  the  nodes  do  not,  in  general,  know  which  node  was  transmitting  but 
each  node  knows  that  some  node  was  transmitting  and  whether  or  not  it  is 
that  node. 

This  protocol  defines  the  "most  importJint"  message  as  the  oldest  highest 
priority  message.  This  is  reasonable  if  the  importance  of  a  message  can  be 
adequately  represented  by  a  static  priority.  However,  the  number  of  priori- 
ties is  quite  smcJl.  There  is  a  further  restriction  which  makes  the  protocol 
less  desirable,  namely  that  the  time  constraint  is  assumed  to  be  the  same 
for  all  messages  in  a  particular  cl<iss.  Zhao,  Stankovic,  and  Ramamritham 
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[Zhao87]  propose  a  protocol  which  removes  these  problems.  This  protocol 
uses  the  laxity  of  a  message  to  rank  the  messages  and  determine  the  "most 
important".  The  laxity  of  a  message,  m,  is  its  deadline,  dm,  minus  the 
current  time,  t.  A  scheduler  that  ensures  that  at  any  instant  the  message 
with  the  least  lajcity  (i.e.  nearest  its  deadline)  is  being  transmitted  is  opti- 
mal in  the  sense  that  it  will  successfully  schedule  a  set  of  messages  if  it  is 
possible  to  do  so  [Mok83].  Such  a  scheduler  is  known  as  a  minimum  laxity 
first  scheduler.  This  protocol  approximates  a  minimum  laxity  first  sched- 
uler. The  reasons  it  can  only  approximate  the  optimal  scheduler  is  that  the 
scheduling  process  itself  takes  time  and  that  the  messages  are  not  preempted 
once  collision-free  transmission  has  begun.  These  conditions  mean  that  the 
protocol  does  not  ensure  that  at  every  instant  the  minimum  laxity  message 
is  being  transmitted. 

Figure  2.3  illustrates  how  this  protocol  works.  The  first  line  of  the  figure 
shows  the  current  time,  <,  and  the  initial  upper  boundary  for  the  interval, 
t^Q,  and  the  deadlines  for  two  messages,  d^a  ^.nd  dm^-  The  initial  value  for 
the  upper  boundary  is  ti^  =  t  -\-  6  where  ^  is  a  system  specific  parameter. 
The  value  of  6  would  be  chosen  so  as  to  majcimize  the  probability  that 
exactly  one  message  would  have  its  deadline  between  t  and  t  +  6.  Since, 
in  the  figure,  the  deadline  for  both  messages  are  in  the  interval,  the  nodes 
containing  the  messages  wiU  both  initiate  transmission  of  the  messages.  This 
will  cause  a  collision  which  will  be  detected  by  all  nodes.  Each  node  then  sets 
<(,j  =  {t  +  tbc)/2  (retaining  the  value  1^^  in  a  stack)  and  check  to  see  if  it  has 
any  messages  with  deadlines  in  the  interval  between  t  and  tf,j .  In  this  case 
there  are  none,  so  all  nodes  will  detect  the  idle  network  and  compute  a  new 
upper  boundary,  <(,,  =  (tj,,  +  tbo)/^-  This  time  there  is  exactly  one  message 
in  the  interval  [t,ti^)  and  it  will  be  transmitted.  When  the  transmission  is 
complete,  the  scheduling  begins  agjiin.  Now  though,  we  know  that  there  is 
at  least  one  message  with  its  deadline  before  thg  so  we  use  it  as  the  upper 
bound  on  the  new  interval  and  the  process  continues. 

2.3.3    Ganglia 

The  above  protocols  are  intended  to  serve  the  rather  wide  domain  of  dis- 
tributed real-time  systems.  Ganglia,  on  the  other  hand,  is  intended  for 
a  restricted  domain,  low-level  robot  controllers,  and  we  can  take  advantage 
of  characteristics  of  this  domaun.  The  window  protocols  just  discussed  as- 
sume only  a  general  or  stochastic  knowledge  of  the  traffic  on  the  network. 
This  is  the  traditional  communications  approach  and  is  reasonable  for  a 
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general  purpose  network,  even  a  general  purpose  reaJ-time  network.  They 
also  implement  a  decentralized  access  control  protocol.  This  is  the  popular 
approach  in  local  area  networks.  It  increases  the  robustness  of  the  network 
and  facilities  dynamic  changes  in  the  network,  which  is  important  in  a  gen- 
eral purpose  network.  Ganglia  is  for  use  in  rather  small  closed  systems, 
robot  controllers.  The  most  frequent  and  most  important  time  constrained 
messages  are  periodic  servo  messages  for  which  the  generation  times  and 
deadlines  are  known  in  advance.  Thus,  it  is  possible  generate  a  schedule  for 
these  messages  off-line. 

A  problem  presented  by  the  window  protocols  is  that  the  nodes  must 
be  fairly  intelligent.  They  must  continuously  monitor  the  network  and  keep 
track  of  the  global  state  of  the  network  and,  of  course,  all  the  nodes  must 
agree  on  this  state.  In  practice,  a  fairly  substantial  processor  at  each  pro- 
cessor will  be  dedicated  to  network  access.  One  might  imagine  a  node  in 
a  robot  controller  whose  sole  purpose  is  to  sample  a  dozen  or  so  analog- 
to-digital  converters  and  report  their  values.  To  burden  this  node  with  a 
powerful  communication  processor  is  excessive,  especially  since  this  node 
may  be  located  in  a  robot  hand  where  space  is  at  a  premium.  One  of  gan- 
glia's design  goals  is  to  be  able  to  incorporate  very  unsophisticated  nodes. 
The  simplest  node  need  only  respond  to  a  specific  poll  by  transmitting  its 
data. 

A  GANGLIA  network  must  also  handle  asynchronous  real-time  messages 
as  well  as  normal  non-time-constrained  traffic.  The  protocol  is  able  to  ac- 
commodate this  traffic,  although,  perhaps  not  as  elegantly  as  the  above 
protocols. 
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Figure  2.3:  Least  laxity  first  protocol. 
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xilERARCHICAL  DESIGN  is  common  in  computer  programs.  The  reason  is 
two-fold.  The  first  is  for  the  user's/programmer's  convenience.  For  instance, 
an  operating  system  might  present  disk  storage  to  the  programmer  as  se- 
quential and  random  access  files,  hiding  the  specific  details  of  disk  access 
and  file  maintenance.  On  top  of  this  are  developed  text  files,  indexed  files, 
relational  databases,  and  so  on.  Thus  at  the  top  the  user  or  programmer  is 
presented  with  "familiaj"  abstractions  suitable  for  the  task  at  hand.  Sec- 
ondly, hierarchical  programming  is  good  software  engineering.  U  one  con- 
siders a  system  as  having  the  raw  devices  on  the  bottom  and  the  application 
programs  on  the  top  then  hierarchical  programming  represents  vertical  mod- 
ularity and  all  the  various  advantages  of  modular  programming  apply.  The 
specification  for  each  step  in  the  hierarchy  can  be  precisely  stated  simplify- 
ing development.  Changes  in  the  underlying  hardware  need  not  propagate 
changes  throughout  the  system  but  can  be  handled  by  the  lower  levels  of  the 
hierarchy.  Programming  modifications  and  optimizations  can  be  applied  at 
the  appropriate  level(s)  again  without  disrupting  the  entire  system. 

All  of  this  applies  to  robot  control  programs  as  well.  Furthermore,  our 
experience  has  shown  that  there  is  a  similarity  of  structure  in  each  of  the 
lower  levels  of  the  control  hierarchy  for  robot  manipulators.  Hic  exploits 
the  hierarchiczd  structure  and  the  similaxity  of  the  lower  levels.  It  is  likely 
that  similar  structures  would  be  found  in  control  programs  for  robot  arms, 
or  walking  robots,  or  other  robot  systems  but  the  discussion  in  this  thesis  is 
focused  on  robot  manipulators  with  particular  emphasis  on  the  Utah/MIT 
hand. 
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Figure  3.1:  Simple  joint  position  servo 


3.1     A  Joint  Servo 

Consider  a  joint  position  servo;  figure  3.1  is  a  schematic  illustration.  This 
servo  consists  of  three  procedures.  The  input  procedure  obtains  readings 
from  the  joints'  position  sensors  and  produces  the  actual  positions.  The 
input  procedure  might  manipulate  sensor  readings,  it  might  convert  the 
raw  sensor  values  to  more  meaningful  units  (radians  for  instance),  or  the 
readings  may  be  filtered  to  remove  noise,  or  some  other  manipulation  to  put 
the  data  in  appropriate  form  for  use  by  the  other  parts  of  the  system.  In 
the  figure  the  input  routine  is  shown  placing  the  actual  position  values  in 
a  structure  known  as  a  periodic  data  buffer  or  PDB.  Pdb's  axe  interesting 
structures  that  play  an  important  role  in  HIC  but  at  this  point  it  is  suffices 
to  2issume  that  routines  can  put  things  (data  aggregates)  into  them  and 
other  routines  can  at  a  later  time  get  access  to  those  same  things.  So  in 
this  exaimple  the  input  routine  places  actual  joint  positions  into  the  actual 
position  PDB  and  the  planner  routine,  the  second  procedure,  will  get  those 
position  values  from  the  PDB.  The  planner  routine  is  invoked  after  the  input 
routine  and  obtains  the  actual  positions  placed  on  the  actual  position  PDB  by 
the  input  routine  and  target  or  goal  positions  from  the  target  position  PDB. 
It  applies  appropriate  control  laws  for  the  joints,  producing  the  trajectories 
to  be  made  by  each  joint  and  places  them  on  the  trajectory  pdb.  Finally,  the 
output  routine  picks  up  the  trajectories  and  commands  the  joint  actuators  to 
perform  the  corresponding  motions.  Target  positions  are  not  produced  by 
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this  servo  but  are  generated  by  some  higher  control  level  which  places  them 
on  the  target  PDB.  This  sequence  of  three  phcises,  input  —  plan  —  output, 
is  repeated  at  a  frequency  appropriate  for  the  devices  being  controlled.  As 
we  will  see  below,  the  three  phase  servo  also  appears  in  the  higher  levels  of 
the  hierarchy  and  is  a  recurring  motif  in  the  HIC  model. 

This  example  in  isolation  is  rather  simple  but  it  illustrates  the  major 
components  of  HIC.  The  servo  routines  collectively  are  called  either  a  se- 
quence or  an  event.  An  event  or  sequence  consists  of  an  arbitrarily  long 
list  of  procedures  that  are  invoked  sequentially  when  the  event  is  triggered. 
An  event  can  be  triggered  in  one  of  three  ways:  it  can  be  triggered  by  the 
background  tcisk  or  by  a  procedure  in  some  other  event,  it  can  be  triggered 
by  the  handler  for  a  hardware  interrupt,  or  it  can  be  scheduled  by  a  time 
process  and  triggered  when  the  scheduled  time  arrives.  This  final  method 
is  typically  used  to  trigger  servo  sequences.  The  joint  servo  is  flagged  as 
a  periodic  event  so  that  when  it  completes  execution  the  HIC  system  auto- 
matically schedules  it  for  the  next  cycle.  There  are  further  refinements  on 
sequences  and  their  procedures  but  discussion  of  these  details  is  postponed 
until  section  4.6. 

It  is  important  to  remember  however  that  events  are  not  processes  in  the 
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normal  multi-processing  sense,  in  particular  the  procedures  in  a  sequence 
can  not  block  while  waiting  for  I/O  to  complete.  The  HIC  system  itself  does 
not  provide  support  for  or  impose  structure  on  the  I/O  devices  so  that  I/O 
is  left  entirely  up  to  the  procedures  involved.  Our  example  servo  assumes 
that  obtaining  the  input  vedues  is  not  a  time-consuming  multi-stage  process. 
This  is  true,  for  instance,  when  the  inputs  are  analog  signaJs  digitized  by  an 
anaJog-to-digital  converter.  Other  devices  may  not  be  so  simple.  Suppose 
that  a  joint  position  device  inputs  its  vaJues  by  direct  memory  access  and 
that  to  obtain  values  it  were  necessary  to  initiate  the  acquisition  and  then 
roughly  1  msec  later  the  values  would  be  available  in  memory.  One  milli- 
second is  a  long  time  to  busy-wait  for  the  input  to  complete  so  our  previous 
example  might  be  modified  as  shown  in  figure  3.2.  Here  another  event  has 
been  added,  the  input  trigger.  The  servo  loop  is  scheduled  by  scheduling 
the  trigger  at  the  appropriate  interval.  When  executed  it  initiates  the  input 
device  and  then  1  msec  later  the  main  body  of  the  servo  loop  is  triggered  and 
the  sequence  proceeds  as  before.  The  main  part  of  the  servo  loop  could  be 
triggered  in  several  ways.  The  two  events  (the  trigger  and  the  body)  could 
each  be  scheduled  periodically  with  1  msec  interval  in  between  their  sched- 
uled times.  This  is  dangerous  because  HIC  does  not  guarantee  that  events 
are  executed  at  the  scheduled  time,  but  only  that  they  will  be  executed 
sometime  after  they  are  scheduled,  depending  on  the  relative  priority  of  the 
event  with  other  activities  on  the  processor.  Thus  there  is  no  assurance  that 
there  is  enough  time  between  the  execution  of  the  events  for  the  input  to 
complete.  An  alternative  is  to  have  the  trigger  event  schedule  the  second 
event  for  1  milli-second  after  it  has  initiated  the  input,  this  can  ensure  there 
is  enough  time  for  the  input  to  complete.  Yet  another  alternative  could  be 
used  if  the  input  device  is  able  to  interrupt  the  processor  when  the  input  is 
complete.  This  interrupt  cjin  be  used  to  trigger  the  main  part  of  the  servo 
loop.  Note  in  this  case  the  main  event  is  not  scheduled  with  the  timer  but 
is  directly  triggered  by  the  input  device's  interrupt.  Which  method  is  most 
appropriate  is  a  design  choice  based  on  the  specific  system. 

Periodic  data  buffers  are  also  shown  in  the  example  systems.  These  are 
message  passing  structures  specially  designed  for  periodically  produced  data 
such  as  those  used  by  the  servo  loops.  Each  PDB  is  for  a  particular  type 
of  data,  actual  values,  target  values,  or  trajectories.  The  basic  operation 
on  pdb's  are  put  and  get.  Data  is  put  on  a  PDB,  by  the  input  routine 
for  instance,  and  these  values  become  the  most  recent  values.  A  routine 
performing  a  get  on  a  PDB  gets  a  pointer  to  the  most  recent  values  put  in  the 
PDB.  In  our  example  the  pdb's  behave  simply  as  buffers  since  there  is  only 
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one  source  and  one  destination  for  the  pdb's  and  they  execute  sequentially. 
However,  in  a  system  with  multiple  servo  loops  and  supervisory  programs 
running  at  various  rates  on  multiple  processors  pdb's  are  an  efficient  method 
to  make  data  available.  In  particular  there  are  no  timing  or  synchronization 
requirements  between  the  various  routines  accessing  a  PDB.  Data  can  be 
put  on  a  PDB  at  any  time  and  that  data  will  become  the  most  recent  data 
in  the  PDB.  A  get  on  a  pdb  returns  a  pointer  to  the  most  recent  data  in 
the  PDB  and  the  data  is  accessible  by  the  routine  as  long  as  it  keeps  the 
pointer,  that  is,  the  data  will  not  be  overwritten  even  though  new  data  may 
be  placed  in  the  PDB.  All  routines  performing  gets  around  the  same  time 
will  receive  pointers  to  the  same  data,  the  data  is  actually  shared  by  the 
various  processes.  For  this  reason  the  data  received  via  a  get  should  be 
viewed  as  read  only  by  the  process  receiving  it. 

3.2     Example:  Utah/MIT  Hand  Control 

Now,  consider  a  more  complete  exajnple,  figure  3.3.  First  of  all,  this  example 
dearly  shows  the  similarity  of  structure  in  the  three  levels  of  the  hierarchy. 
This  example  is  similar  to  the  hierarchy  we  are  developing  to  control  the 
Utah/MTT  hand  in  our  laboratory.  This  example  is  discussed  in  some  detail 
both  to  illustrate  how  a  control  program  can  use  HIC  and  to  illustrate  the 
control  scheme  for  the  hand. 

Raw  joint  servo.  The  raw  joint  servo  is  more  complex  than  in  the  first 
example.  The  input  is  both  the  joint  angles  for  the  16  joints  and  the  ten- 
sions on  the  32  tendons  (two  per  joint).  The  input  procedure  does  little 
manipulation  of  the  input  data.  The  joint  angles  are  read  as  12-bit  signed 
integers  and  are  converted  to  16-bit  integers  and  put  in  the  actual  PDB  with- 
out further  modification.  The  strain  values  are  12-bit  signed  integers,  they 
are  converted  by  a  simple  formula  to  torques  at  each  joint,  16-bit  values 
stored  along  with  the  angles. 

Target  vzdues  are  supplied  by  the  next  level  in  the  hierarchy,  the  fin- 
ger servo.  These  values  correspond  directly  to  the  actual  values,  there  are 
16  desired  joint  and  16  desired  torques  both  in  the  same  units  as  the  cor- 
responding actual  values.  The  control  parameters  are  parameters  for  the 
planner  routine.  The  planner  takes  as  its  inputs  the  most  recent  actual 
values  and  the  most  recent  target  values  and  applies  an  amalgam  of  force 
control  and  position  control  on  the  individual  joints  using  the  most  recent 
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Figure  3.3:  A  three  level  hierarchy. 
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control  parameters.  Gains  for  both  position  feedback  and  force  feedback  (for 
the  individual  joints)  are  obtained  from  the  control  pjirameters.  Typically 
for  any  one  joint  one  of  the  two  gains  is  zero,  thus  reducing  the  amalgamated 
force/position  control  simply  to  force  or  position  control.  In  the  figure,  con- 
trol parameters  are  shown  as  comming  from  the  finger  servo,  actually  it  is 
likely  that  they  may  be  set  by  any  of  the  higher  levels  of  control.  Finally,  the 
planner  produces  trajectories  (16-bit  integers)  for  the  joint  actuators  which 
are  applied  by  the  output  procedure  without  conversion  (except  for  16-bit  to 
12-bit  integers).  For  reasons  of  speed  all  values  in  the  joint  servo  are  integer 
and  the  units  are  those  used  by  the  anaJog-to-digitaJ  and  digital-to-analog 
converters,  hence  the  name  "raw  joint  servo". 

In  this  example  it  is  easier  to  see  the  value  of  PDB's.  The  pdb's  act  as 
buffers  between  the  control  levels  both  in  that  they  hold  data  as  it  passes 
between  levels  and  that  they  reduce  problems  presented  by  the  fact  that  the 
servos  at  different  levels  may  be  operating  at  different  rates  and  may  reside 
on  different  processors.  For  instance,  the  joint  servo  operates  on  a  processor 
by  itself,  with  the  finger  and  object  servos  running  on  a  different  processor, 
and  there  is  aa  order  of  magnitude  difference  between  the  rate  of  the  joint 
servo  and  the  finger  servo.  Debugging  procedures  and  system  monitoring 
procedures  can  make  use  of  PDB's  also.  Although  not  shown  in  the  figure, 
the  pdb's  can  be  accessed  by  any  procedure  in  the  system  so  that  a  monitor 
process  on  the  host  computer  might  monitor  or  record  the  joint  positions 
by  periodically  getting  a  "snapshot"  from  the  actual  pdb.  Any  process  can 
also  perform  puts  on  a  PDB  so  that  a  monitor  or  debugging  process  could 
override  the  finger  servo  and  supply  target  positions  directly  to  the  joint 
servo.  It  is  for  these  reasons  that  there  is  a  joint  trajectory  PDB,  which  on 
the  surface  seems  a  bit  superfluous  since  it  connects  only  the  joint  planner 
and  the  joint  output  routines  that  are  always  run  sequentially  (thus  a  simple 
buffer  could  be  used  instead).  But  putting  the  trajectories  in  a  pdb  allows 
a  process  to  monitor  them.  In  the  example  the  finger  servo  provides  the 
control  parameters  for  the  joint  servo,  a  likely  alternative  is  that  a  task  level 
process  might  be  responsible  for  the  joint  level  control  parameters.  Pdb's 
make  all  of  this  possible  with  little  interference  to  the  basic  system. 

Pdb's  are  a  little  more  complicated  to  use  than  presented  so  fax.  To  the 
basic  access  procedures,  put  and  get  there  are  two  complimentary  procedures 
reserve  and  unget.  A  process  that  wishes  to  place  values  on  a  PDB  must  first 
reserve  a  buffer  (using  the  reserve  call),  the  data  is  placed  in  the  buffer  and 
then  the  buffer  is  put  in  the  pdb.  Similarly  when  reading  from  a  pdb  a 
process  gets  a  buffer  (actually  a  pointer  to  the  buffer)  and  uses  the  data 
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as  needed.  After  examining  the  buffer,  the  process  performs  an  unget  to 
indicate  that  it  will  no  longer  access  the  data  and  that  the  data  area  can  be 
freed,  if  no  other  process  is  using  it. 

Finger  servo.  As  was  remarked  earlier,  the  finger  servo  is  similar  in  struc- 
ture to  the  joint  servo.  There  are  three  procedures:  input,  planning,  ajid 
output.  Input  to  the  finger  servo  consists  mainly  of  the  joint  positions  from 
the  lower  level  joint  servo.  Forward  kinematics  for  the  fingers  are  applied 
to  the  joint  positions  to  obtain  the  positions  of  the  finger  tips  in  Cartesian 
coordinates.  Using  the  torques  on  the  individuaJ  joints,  an  estimate  of  the 
forces  being  applied  by  the  finger  tip  cam  be  obtained.  Only  an  estimate  is 
possible  because  the  joint  torques  alone  are  not  enough  to  determine  the  fin- 
ger tip  forces.  The  addition  of  tactile  sensors  to  the  finger  tips  would  allow 
a  more  accurate  determination  of  the  forces  on  the  fingers.  Input  from  these 
sensors  would  be  appropriate  at  this  point.  The  results  are  placed  in  a  PDB 
for  the  actual  finger  tip  positions  and  finger  tip  forces.  The  planning  pro- 
cedure picks  up  the  actual  position  and  force  values  and  the  corresponding 
targets.  Then  using  an  appropriate  control  formula  produces  trajectories 
for  the  output  procedure.  It  is  up  to  the  output  routine  then  to  apply  the 
inverse  kinematics  and  produce  target  values  for  the  joint  servo. 

Note  that  in  figure  3.3  there  is  no  parameter  PDB  for  the  finger  servo 
(or  the  object  servo).  This  is  more  a  matter  of  ziesthetics  than  design.  Any 
control  scheme  is  likely  to  have  operation  parameters  and  if  these  change 
while  the  system  is  in  operation  then  a  PDB  should  be  used  to  hold  the 
parameter  vjJues.  But,  axlding  these  parameter  pdb's  to  the  figure  compli- 
cates the  picture  without  significantly  increasing  the  illustrative  vcdue.  So 
the  parameter  pdb's  have  been  left  out  of  the  figure. 

Object  servo.  The  object  level  servo  is  not  so  intuitive.  When  the  fin- 
gers have  a  fixed  grip  on  a  rigid  object  then  there  is  a  simple  coordinate 
transformation  between  the  position  of  the  fingers  and  the  position  and  ori- 
entation of  the  object.  It  is  possible  by  a  linear  transformation  to  deduce 
the  external  forces  applied  to  the  object  from  the  forces  sensed  at  the  finger 
tips  (the  points  of  contact  with  the  object)  [Demm88].  Note  that  not  all 
forces  sensed  by  the  fingers  are  due  to  external  forces  since  the  fingers  must 
apply  force  to  maintain  a  grasp  of  the  object. 

In  a  very  real  sense  the  entire  system  (joints,  fingers,  and  object)  become 
a  single  finger.  Consider  a  person  writing  with  a  pencil.  The  pencil  is  gripped 
firmly  and  the  forces  transmitted  through  the  pencil  to  the  fingers  is  used, 
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in  part,  to  infer  the  nature  of  the  writing  surface  and  the  amount  of  force 
to  apply.  Manipulation  of  an  object  or  tool  in  which  the  grasp  is  fixed  and 
a  single  control  scheme  is  used  on  the  tool  is  called  a  homogeneous  manip- 
ulation or  homogeneous  task  [Demm88].  Homogeneous  tasks  can  be  strung 
together,  with  appropriate  transitions,  to  make  more  complex  tasks.  Thus 
writing  can  be  decomposed  into:  grasping  the  pencil  (not  homogeneous); 
transporting  it  to  the  paper  (a  homogeneous  task);  applying  the  pencil  to 
the  paper  (homogeneous);  and  making  marks  on  the  paper  via  a  sequence  of 
strokes  (each  homogeneous).  This  highlights  an  important  feature  of  the  Hic 
system,  namely  that  the  procedures  in  an  event  are  "pluggable  modules".  In 
the  end,  as  far  as  the  HIC  system  is  concerned,  the  object  level  planner  is  just 
a  pointer  to  the  actual  procedure  and  a  higher  level  process  can  change  the 
pointer  as  desired.  Thus  there  may  be  a  planning  procedure  for  maintaining 
a  grip  while  the  hand  is  in  free  motion,  another  planner  for  bringing  the 
pencil  into  contcict  with  the  paper,  and  perhaps  different  planners  for  the 
different  types  of  strokes  made  while  writing.  So  managing  the  writing  task 
is  now  largely  a  matter  of  switching  from  one  planning  procedure  to  another 
at  the  appropriate  times  and  providing  the  appropriate  parameters.  The 
planning  modules  axe  not  the  only  ones  that  are  pluggable,  any  procedure 
in  a  sequence  can  be  replaced  in  the  same  way,  so  if  it  were  useful,  the  input 
or  output  procedures  can  be  changed  as  needed  during  a  task. 

Now  we  return  to  the  example  in  figure  3.3.  During  a  homogeneous 
manipulation  the  object  servo  input  procedure  taJies  the  actual  positions 
of  the  finger  tips  and  calculates  the  position  and  orientation  of  the  object. 
Also  the  forces  at  the  finger  tips  axe  used  to  calculate  the  external  forces 
on  the  object.  The  planner  then  compares  these  with  the  target  values  and 
calculates  an  appropriate  trajectory  for  the  object.  The  output  procedure 
transforms  the  object  trajectory  into  targets  for  the  finger  servo. 

Higher  level.  Where  do  the  object's  targets  come  from?  They  come  from 
higher  level  processes.  The  higher  level  processes  are  less  likely  to  fit  the 
"closed  form"  described  in  the  Introduction  and  therefore  less  well  suited  for 
HIC.  They  will  run  under  more  conventional  real-time  operating  systems, 
such  as  Sage  [Salk88]  or  perhaps  on  the  host  computer.  However,  these 
control  levels  will  have  access  to  the  pdb's,  in  fact  the  PDB's  will  be  the 
primary  interface  between  the  higher  level  control  and  the  low  level  servo 
loops. 
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J.  HIS  CHAPTER  presents  the  implementation  of  HIC.  First  we  will  briefly 
examine  the  hardware  in  the  hand's  control  system.  Then  the  Hlc  software 
implementation  is  described. 

Hic  is  not  a  complete  operating  system  in  itself.  It  must  run  "on  top  of 
another  operating  system  that  provides  the  nuts  and  bolts  of  the  interface  to 
the  bare  system.  The  current  implementation  of  Hic  uses  Condor  to  provide 
the  low  level  interface. 


4.1     Hardware 

The  heart  of  the  system  for  controlling  the  hjuid  consists  of  four  Motorola 
68020  processor  boards  on  a  VME  bus  (see  figure  4.1).  The  bocirds  are  man- 
ufactured by  Ironies  (rV-3201)  [Iron86]  and  each  has  a  68020  processor  and 
1  megabyte  of  memory.  They  are  numbered  from  0  to  3.  The  memory  is 
dual-ported  so  it  is  accessible  both  by  the  processor  and  by  other  VME  bus 
masters  (i.e.  the  other  processors).  The  processors  support  mailbox  inter- 
rupts  for  inter-processor  communication.  Writing  to  a  known  location  in  a 
processor's  local  address  space  causes  an  interrupt  to  the  processor  [Iron86]. 
Condor  implements  a  remote  procedure  c«Jl  protocol  using  the  Ironies  mail- 
box interrupt,  by  which  any  of  256  previously  specified  procedures  can  be 
invoked,  the  procedure  is  given  a  single  word  argument  and  can  return  a 
single  word  value.  Each  processor  has  a  single  interval  timer. 
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Figure  4.1:  Utah/MIT  hand  control  hardware. 


Another  Ironies  board,  the  rV-3273  System  Controller  is  the  VME  bus 
arbitrator  which  is  responsible  for  arbitrating  requests  for  access  to  the  bus 
by  the  processors.  The  IV-3273  also  contains  the  connections  to  the  console 
terminal  for  the  processors.  Uidike  most  single  board  computers  the  Ironies 
boards  do  not  have  individual  console  connections,  they  must  share  the 
single  console  on  the  system  controller  board.  In  practice  only  one  of  the 
processors  (processor  0)  uses  the  console.  Pseudo  terminal  connections  to 
the  host  computer,  based  on  those  provided  by  Condor,  can  be  used  as 
console  terminals  for  the  processors. 

A  Sun  3/160  is  the  host  system  for  the  hand  controller.  It  is  connected 
to  the  VME  bus  via  a  bus  connector  from  HVE  Engineering  (HVE  Syner- 
gist III)  [HVE86].  This  makes  all  of  the  memory  on  the  VME  bus  visible 
to  the  Sun.  Programs  for  the  controller  are  compiled  on  the  Sun  using  the 
Sun's  own  compiler  and  are  downloaded  directly  to  the  appropriate  proces- 
sor's memory.  The  processor  can  then  be  started  via  a  mailbox  interrupt. 
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Once  running,  the  controller  programs  and  the  host  can  communicate  via 
the  shared  memory,  on  the  VME  bus.  Condor  also  allows  the  controller's 
processors  to  cause  mailbox  interrupts  on  the  host  and  vice  versa.  In  partic- 
ular, the  host  has  access  to  HIC  periodic  data  buffers  on  the  controller  which 
allow  it  to  monitor  and  interact  with  the  processes.  Another  feature  pro- 
vided by  Condor  is  support  for  the  GNU  debugger  (gdb)  [Stal87b]  which  is  a 
source  level  debugger  for  programs  written  in  C  running  on  the  controller's 
processors. 

The  hand's  actuators  and  sensors  are  connected  to  an  analog  controller. 
The  controller  contains  analog  servos  for  position,  velocity,  and  torque.  It 
provides  as  output  ancJog  signals  for  current  position,  current  velocity,  and 
the  tension  on  each  of  the  tendons  (two  per  joint).  It  accepts  as  input  analog 
signals  for  the  desired  position,  position  gain,  velocity  gain,  and  gains  for 
the  tendon  tension.  An  optional  mode  allows  direct  access  to  the  actuators, 
bypassing  the  zoialog  servo  loops.  The  analog  controller  is  connected  to 
the  digital  controller  via  digital-to-analog  and  analog-to-digital  converters 
on  the  VME  bus.  These  aie  manufactured  by  Data  Translation  (models 
DT1406  &  DT1401)  [Tran86a]  [Tran86b]. 

Multi-processor  issues.  Hic  is  designed  to  run  on  a  multiple  processors 
shared  memory  architecture.  In  particular,  periodic  data  buffers  are  acces- 
sible across  processor  boundaries.  There  are  also  inter-processor  mailbox 
interrupts  in  Condor  which  underlies  HIC.  The  Sun  host  processor  commu- 
nicates with  the  HIC  system  via  both  pdb's  and  mailbox  interrupts.  The 
processors  run  independently.  Each  processor  has  its  own  stack,  timer,  pri- 
ority list,  scheduled  list,  and  set  of  events  which  are  statically  a.ssigned  to 
the  processors.  There  are  no  built-in  provisions  in  HIC  for  load  balancing  or 
for  events  to  migrate  from  one  processor  to  another. 

The  processor  boards  on  which  HIC  has  been  developed  do  not  have  any 
form  of  memory  management.  This  complicates  shared  memory  commu- 
nication, to  some  extent.  One  result  of  this  is  that  a  PDB  and  all  of  its 
associated  data  areas  (PDATA  structures)  must  reside  in  the  local  memory  of 
a  single  processor,  a  mild  restriction. 

4.2     HiC  Software 

The  HIC  operating  system  does  not  provide  multi-processing  facilities  in  the 
normal  sense.  The  most  striking  difference  is  that  there  is  only  a  single  stack 
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Figure  4.2:  Example  of  priority  inversion. 


per  processor  with  the  immediate  consequence  that  processes  can  not  block 
since  a  blocked  process  prevents  any  of  the  preempted  processes  below  it  on 
the  stack  from  continuing  execution.  Processes  can  be  preempted  however. 
If  a  process  with  higher  priority  than  the  current  process  is  triggered  then 
the  current  process  is  suspended  and  the  higher  priority  process  is  allowed  to 
proceed.  And  since  the  new  event  will  not  block,  it  will  run  to  completion, 
unless,  of  course,  an  event  with  still  higher  priority  is  triggered.  It  is  easy 
to  see  that  this  ensures  that  at  any  point  the  highest  priority  "ready"  event 
is  running  (on  a  per  processor  basis).  This  predictable  behavior  is  very 
desirable  in  real-time  operating  systems.  Thus,  by  prohibiting  blocking,  HIC 
avoids  the  problem  o{  priority  inversion  [Sh2i86]. 

Consider  an  example  (figure  4.2)  of  three  processes,  PI  (low  prior- 
ity), P2,  aind  P3  (high  priority)  and  a  shared  resource  (represented  by 
the  semaphore  S)  in  a  normal  multi-tasking  system  with  simple  priority 
scheduling. 

•  At  time  tO,  PI  is  executing. 

•  At  il,  PI  performs  P(5)  and  enters  its  critical  section  (represented  by 
the  angled  hash  lines). 
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•  At  t2,  P2  preempts  PI  and  executes  until  t3  when  P3  preempts  P2. 

•  At  <4,  P3  performs  P(5),  but  blocks  because  PI  is  holding  S.  When 
P3  blocks  P2  is  resumed  since  it  is  the  highest  priority  runnable  pro- 
cess. This  is  the  priority  inversion  (represented  by  the  horizontal  hash 
lines). 

•  At  tb,  P2  completes  and  PI  is  resumed,  in  the  midst  of  its  critical 
section  which  completes  at  <6,  and  performs  V(5). 

•  At  t6  when  PI  completes  its  critical  section,  P3  enters  the  critical 
section  with  semaphore  S  and  enters  its  critical  section  which  continues 
until  t7  when  V(5)  is  performed  and  P3  continues. 

During  the  interval  from  M  to  tb,  P3  is  waiting  for  PI  to  complete  its  critical 
section  but  P2  is  executing.  This  is  the  priority  inversion  since  P3  is  now 
essentially  preempted  by  P2,  although  P2  is  of  lower  priority.  Note  that  the 
interval  from  t4  to  t5  is  arbitrarily  long.  In  a  real-time  system  with  hard 
deadlines  this  is  intolerable  since  a  time  critical  process  can  be  blocked  for 
unbounded  intervals  by  lower  priority  processes.  P3  is  also  blocked  during 
the  interval  from  tb  to  t6  but  this  is  an  unavoidable  side  affect  of  resource 
sharing.  In  addition,  the  length  of  this  delay  can  be  bounded  since  in  a  well 
designed  system  the  critical  sections  are  short  and  well  known  so  that  the 
designer  of  P3  can  take  them  into  account. 

There  are  ways  to  avoid  priority  inversion  short  of  the  draconian  method 
of  prohibiting  blocking,  taken  by  mc.  A  straight  forward  technique  is  pri- 
ority inheritance  [Sha86]  in  which  a  process  holding  a  resource  inherits  the 
priority  of  the  highest  priority  task  blocked  on  that  resource  (this  is  the 
technique  used  by  SAGE).  Figure  4.3  shows  the  same  example  with  priority 
inheritance.  Note  that  the  same  amount  of  work  is  performed  in  the  same 
time  but  that  when  P3  blocks  at  t4,  PI  inherits  the  priority  of  P3  and 
therefore  takes  precedence  over  P2.  PI  is  then  able  to  complete  its  critical 
section  and  P3  is  able  to  continue  as  soon  as  is  possible.  Since  a  process  is 
never  blocked  for  a  period  longer  than  a  critical  section  it  is  possible  to  pre- 
dict the  maximum  amount  of  time  that  a  process  will  block  (assuming  that 
the  number  of  critic<d  sections  is  small  and  each  critical  section  is  bounded 
in  execution  time). 

Hic  takes  the  approach  of  prohibiting  blocking  for  efficiency  reasons.  It 
is  able  to  run  with  a  single  stack  per  processor.  Since  there  is  only  one 
stax;k,  there  is  very  little  state  information  that  needs  to  be  saved  during  a 
context  switch.  In  normal  circumstances  when  an  event  completes  it  simply 
returns  to  the  scheduler  which  searchs  for  the  next  highest  event  that  is 
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Figure  4.3:  Saime  excimple  with  priority  inheritance. 


ready  and  invokes  it  with  a  subroutine  call.  If  an  event  of  lower  priority  is 
triggered  either  by  the  current  process  itself  or  by  an  interrupt  procedure, 
the  event  is  simply  marked  as  ready  and  nothing  else  need  be  done.  An 
event  of  higher  priority  than  the  current  event  can  be  triggered  in  two  ways, 
the  current  event  can  trigger  it  in  which  case  the  event  is  immediately  in- 
voked with  only  a  little  more  overhead  than  a  subroutine  call  or  the  event 
can  be  triggered  by  an  interrupt  handler  where  the  overhead  is  the  same  but 
with  the  added  overhead  of  the  interrupt  invocation.  Why  is  it  reasonable 
to  prohibit  blocking  in  HIC?  The  answer  is  that  the  lowest  levels  of  robot 
control  hierarchies  are  usually  servo  loops  which  can  follow  the  HIC  form  of 
injiut  —  compute  —  output  described  in  the  Introduction.  Hic's  approach 
to  resource  allocation  is  that  a  process  must  have  all  of  the  resources  re- 
quired allocated  before  it  is  initiated.  And  that  the  triggering  of  an  event 
is  essentially  a  signal  that  all  resources  for  a  process  are  available.  This  is 
not  quite  true  though.  Processes  in  HIC  are  allowed  to  access  periodic  data 
buffers  at  any  time.  This  is  permitted  because  the  PDB's  are  designed  to 
always  return  something  "reasonable"  without  blocking.  In  the  case  of  get- 
ting from  a  PDB,  "reasonable"  means  that  the  most  recently  available  values 
are  returned  (possibly  the  same  values  as  obtained  by  the  last  access)  which 
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Figure  4.4:  Event  structure  (EVENT). 


for  servo  loops  is  often  acceptable  and  preferable  to  having  the  loop  block 
while  waiting  for  new  values.  In  the  ca^e  of  reserving  a  data  area  to  put 
something  in  a  PDB  the  reserve  routine  will  return  an  empty  data  area  if  one 
is  available  otherwise  a  null  pointer  is  returned.  To  ensure  that  empty  data 
areas  are  always  available  for  processes  reserving  them  a  sufficient  number 
of  data  areas  must  be  allocated  when  the  system  is  designed. 

4.3     Events 

Figures  4.4  and  4.5  show  the  C  structures  for  Hic  events  and  procedures 
respectively.  Each  event  in  the  system  has  an  associated  structure  and  each 
event  has  a  Ust  of  procedures  to  be  invoked  when  the  event  is  executed. 

Neit_Bcheduled_event  and  next^jriority-event  are  pointers  for  the 
two  lists  which  contain  events.  These  lists  are  described  in  more  detail 
below. 

Flags  contains  flags  associated  with  the  event.  One  of  the  flags,  EVENT_ 
SCHEDULED,  is  set  if  the  event  has  been  scheduled  in  the  timer  list  (see  below). 
Another,  EVENT_READY,  is  set  if  the  event  is  ready  to  execute  and  EVENT- 
ACTIVE  indicates  the  event  is  executing  or  has  been  preempted  (i.e.  that  the 
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event  is  holding  space  on  the  stack).  EVENT-PERIODIC  indicates  the  event  is  a 
periodic  event  and  should  be  automatically  rescheduled.  EVENT-PERMITTED 
is  set  to  indicate  that  the  event  can  be  executed,  otherwise  execution  is 
inhibited. 

Id  is  a  system  wide  unique  identifier  for  the  event.  U  the  event  must  be 
"well  known"  an  id  can  be  assigned  by  the  user  when  it  is  created  otherwise 
the  system  assigns  a  unique  id. 

Name  is  an  ASCII  string  naming  the  event  for  debugging  purposes. 

Dispatch_count  is  a  counter  incremented  each  time  the  event  is  dis- 
patched, for  instrumentation  purposes. 

Priority  and  rank  make  up  the  priority  of  the  event.  Higher  values 
indicate  higher  priority.  Priority  is  the  nominal  priority  which  can  be 
assigned  by  the  user  when  the  event  is  created  and  falls  in  the  range  1 
to  99.  U  the  event  is  periodic  the  system  will  assign  a  standard  priority 
based  on  the  periodic  rate,  if  desired.  If  all  periodic  events  are  assigned 
the  standard  priorities  then  they  will  be  prioritized  in  rate  monotonic  order 
(more  frequent  events  have  higher  priority). 

For  technical  reasons,  HIC  requires  that  each  event  actually  have  unique 
priority  (this  is  for  efficiency  in  the  scheduler).  When  an  event  is  entered  into 
the  system  it  is  assigned  a  unique  rank,  rank,  based  on  its  nominal  priority, 
priority;  rank  is  actually  used  by  the  system  when  comparing  priorities  of 
events.  U  two  events  are  assigned  the  priority  20,  for  example,  the  first  one 
will  have  rank  2099  and  the  second  will  have  raink  2098.  An  unfortunate 
consequence  of  this  restriction  is  that  events  with  the  same  priority  will 
not  exhibit  "round-robin"  behavior  as  might  be  expected.  That  is,  if  an 
event  is  triggered  that  has  the  same  priority  as  the  currently  executing 
event  it  is  in  general  unknown  whether  the  triggered  event  will  preempt 
the  current  event  (if  it  happens  to  have  higher  rank)  or  whether  it  will  be 
executed  after  the  current  event  completes  (if  it  has  lower  rank).  Since  the 
distinction  between  the  priority  and  the  rank  of  an  event  is  just  a  technical 
matter  of  implementation  we  will  use  the  more  meaningful  phrase  "priority 
of  an  event"  and  leave  it  to  the  reader  to  remember  the  subtle  distinction 
between  priority  and  rank. 

Event  priorities  are  essentially  static.  It  is  possible  to  change  the  event 
priority  list  as  discussed  below  but  to  do  this  one  must  "step  outside"  the 
HIC  system  and  this  cainnot  be  done  "on  the  fly". 

Scheduled.time,  nominal -time,  and  interval  are  used  to  schedule 
events  in  a  timer  queue  and  reschedule  periodic  events.  An  event  can  be 
scheduled  to  be  triggered  at  a  particular  time,  while  it  is  inserted  in  the  list 
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of  scheduled  events.  Scheduled.time  holds  the  time  when  the  event  should 
be  triggered. 

When  a  periodic  event  completes  execution  it  is  rescheduled  at  the  time 
nominal_time  plus  interval  automatically  and  then  nominal-time  is  set 
to  this  new  value.  The  purpose  of  nominal-time  is  to  help  detect  system 
overload  and  missed  deadlines.  If  the  processor  is  so  overloaded  that  a 
periodic  event  misses  a  complete  cycle  then  when  it  is  rescheduled  the  time 
(i.e.  nominal-time  +  interval)  will  be  less  than  the  current  (in  other  words 
it  will  be  scheduled  in  the  pcist).  HiC  does  not  allow  events  to  be  scheduled 
in  the  past  so  it  will  schedule  the  event  to  be  executed  at  the  current  time. 
In  this  situation,  nominal-time  and  scheduled.time  will  differ.  This  is  an 
indication  that  the  processor  is  falling  behind.  HiC,  itself,  does  not  attempt 
to  correct  the  problem;  that  is  left  to  the  user. 

Procedure-list  is  the  head  of  the  list  of  procedures  to  be  performed 
when  the  event  is  executed.  The  basic  operation  is  that  the  procedures  are 
executed  in  the  sequence  presented;  however,  a£  we  will  see  below,  there  can 
be  exceptions. 

An  event  is  actually  triggered  by  calling  hic-dispatch()  with  a  pointer 
to  the  EVENT  structure  for  the  event.  U  the  event  is  of  higher  priority  than 
the  currently  executing  event  then  hicjdispatchO  invokes  the  new  event 
directly  otherwise  the  event's  ready  flag  is  set  and  the  event  will  be  invoked 
in  the  proper  order. 

4.4     Procedures 

Figure  4.5  shows  the  structure  for  HIC  procedures. 

Next  points  to  the  next  procedure  in  the  list  of  procedures  in  this  event. 

Id  and  name  are  similar  to  the  same  fields  in  the  EVENT  structure.  Id 
is  an  unique  identifier  which  can  be  assigned  automatically  and  name  is  an 
ASCII  name  for  debugging  purposes. 

Invocation_count  is  incremented  each  time  the  procedure  is  executed. 
It  is  for  instrumentation  purposes. 

Count,  reset,  and  initial  provide  some  scheduling  capability  internal 
to  an  event.  When  a  procedure  is  created  count  is  given  the  value  initial. 
As  the  HIC  scheduler  steps  through  the  list  of  procedures  for  an  event  it 
decrements  count  and  if  it  is  greater  than  zero  the  corresponding  function 
is  not  invoked.  If  count  is  zero  or  negative  then  the  function  is  invoked  and 
and  count  is  set  to  reset. 
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/* 

PROC  the  procedure 

structure. 

V 

typedef 

struct  proc 

{ 

struct  proc* 

next; 

int 

id; 

char* 

naae; 

int 

invocat  ioiLXOunt ; 

int 

count; 

int 

reset; 

int 

initial; 

▼oid 

(»entry)()  ; 

▼old* 

argument; 

}PROC; 

Figure  4.5:  Procedure  structure  (PROC). 


An  example  of  where  this  might  be  used  is  for  a  safety  or  monitor  pro- 
cedure associated  with  a  servo  event.  Suppose  that  the  monitor  procedure 
need  be  executed  only  every  fifth  time  the  servo  is  invoked.  Then  reset 
should  be  set  to  five.  As  another  example  suppose  that  input  values  for  a 
servo  loop  are  to  be  sampled  300  times  a  second  but  to  reduce  the  effects  of 
noise  on  the  inputs  three  samples  are  averaged  before  they  axe  given  to  the 
planning  procedure.  Then  the  servo  event  should  be  scheduled  every  1/300 
of  a  second  and  reset  for  the  input  procedure  is  set  to  one  so  it  is  invoked 
every  time  the  servo  is  triggered.  For  the  planning  procedure  and  output 
procedures  reset  is  set  to  three  so  they  are  invoked  only  every  third  time. 

Entry  points  to  the  C  function  to  be  invoked  for  this  procedure.  When 
it  is  called,  it  is  passed  the  pointer,  argument. 

4.5     Dispatching  and  Scheduling 

There  are  two  lists  for  events,  the  priority  list  and  the  scheduled  list.  The 
priority  list  is  a  linked  list  of  events  in  priority  order  with  the  head  of  the 
list  being  the  highest  priority  event.  Events  are  dispatched  from  the  priority 
list.  Figure  4.6  shows  how  the  dispatcher  and  the  priority  list  interact.  The 
list  is  implemented  as  a  linked  list  but  is  shown  here  as  an  array  to  simplify 
the  figure.  The  events  are  in  priority  order  with  event  1  being  the  highest 
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Figure  4.6:  Priority  list  and  the  dispatcher. 
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priority,  and  the  events  are  represented  by  their  states,  Active  (in  the  midst 
of  execution  and  holding  space  on  the  stack),  Ready  (ready  to  execute),  and 
blank  (not  ready  or  idle). 

Priority  list.  In  figure  4.6a  there  is  one  instance  of  the  dispatcher  with 
event  4  currently  active  and  event  5  ready  to  execute.  In  figure  4.6b,  event  4 
has  completed  and  the  dispatcher  has  moved  on  to  event  5.  While  event  5 
is  executing  the  system  timer  interrupts,  searches  the  list  of  scheduled  pro- 
cesses, and  finds  three  events  whose  time  has  expired  (2,  3,  and  7).  These 
events  axe  readied  and  since  event  2  is  higher  in  priority  than  the  current 
event,  5,  a  second  instance  of  the  dispatcher  is  invoked  on  event  2.  This  is 
the  state  of  the  system  in  figure  4.6c.  When  event  2  completes  its  dispatcher 
(the  second  instance)  moves  on  to  event  3  (4.6d).  Then  when  event  3  com- 
pletes the  second  dispatcher  searches  down  the  list  for  another  ready  process 
but  it  encounters  the  active  process  5  which  causes  the  second  dispatcher  to 
exit  returning  eventually  to  the  first  dispatcher  which  continues  executing 
event  5  (4.6e).  Finally,  when  5  completes  the  dispatcher  finds  event  7  ready 
jind  executes  it  (4.6f ).  If  the  dispatcher  falls  of  the  end  of  the  list  it  performs 
a  phantom  event,  the  background  task.  In  general,  any  number  of  instances 
of  the  dispatcher  can  be  running  and  for  each  dispatcher  there  is  one  active 
event.  When  an  event  exits  its  dispatcher  searches  down  the  priority  list  un- 
til either  a  ready  event  is  encountered,  which  is  then  executed,  or  an  active 
event  is  encountered  in  which  case  the  dispatcher  exits. 

Notice  that  since  every  active  process  is  holding  some  of  the  processor's 
stack,  these  processes  cannot  be  removed  from  the  system  until  they  com- 
plete their  execution  cmd  release  the  stack  space  they  hold.  This  makes 
changing  the  priority  list  a  risky  proposition  without  thorough  knowledge 
of  the  states  of  all  the  events.  However,  an  event  while  executing  can  be 
sure  that  there  are  no  events  of  higher  priority  active,  so  if  an  event  is  sure 
it  will  not  be  preempted  it  can  change  the  priority  list  between  itself  and 
the  top  of  the  list.  The  proper  place  to  reconfigure  the  priority  list  is  in  the 
background  task.  The  background  task  can  be  sure  that  no  event  in  the  list 
is  active  so  the  entire  list  can  be  modified.  It  must  first  malce  sure  it  wiU 
not  be  preempted  by  any  of  the  events  in  the  list.  This  is  usually  done  by 
disabling  interrupts  while  the  list  is  manipulated. 

What  happens  if  an  event  is  triggered  while  it  is  active?  If  the  dispatcher, 
hicjiispatchO,  is  called  with  the  current  event  as  its  argument  it  simply 
sets  the  ready  flag,  EVENT-READY,  since  the  priority  of  the  triggered  event 
is  not  greater  than  the  priority  of  the  current  event  (since  the  triggered 
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event  is  the  current  event).  When  the  current  event  completes  execution 
the  dispatcher  checks  its  ready  flag  and  if  it  is  set  the  event  is  invoked  again. 
Thus  a  malicious  or  malformed  event  could  lock  out  all  lower  priority  events 
by  re-triggering  itself  each  time  it  executes. 

Scheduling  events.  Events  can  be  scheduled  to  be  triggered  at  specific 
times.  The  procedure  hic_Bchedule()  given  an  event  and  a  time,  adds  the 
event  to  the  scheduled  event  list  which  is  maintained  in  ascending  order  of 
time.  Periodically,  the  timer  interrupts  (currently  every  millisecond)  and 
readies  every  event  on  the  list  whose  time  has  expired,  these  events  are  then 
removed  from  the  list.  Periodic  events  aie  automatically  rescheduled  by  the 
dispatcher  each  time  they  are  invoked. 

4.6     Periodic  Data  Buffers 

Periodic  data  buffers  ctre  discussed  in  the  various  examples  of  section  3.  They 
are  similar  in  some  respects  to  sticky  messages  used  in  the  GEM  operating 
system  developed  at  Ohio  State  University  [Schw85]  and  to  the  outO  and 
readO  primitives  of  the  language  Linda  [Gele85].  There  are  four  procedures 
for  accessing  pdb's.  To  place  data  on  a  PDB  a  process  calls  pq_rGserve() 
which  returns  a  pointer  to  an  empty  data  area,  it  then  places  the  data  in  the 
data  area,  and  then  calls  pq_put()  to  place  the  data  in  the  PDB.  Pq_get() 
returns  a  pointer  to  the  most  recent  data  put  in  the  buffer  (by  pq_put()). 
In  general  the  data  obtained  by  pq_get()  is  shared  by  multiple  processes  so 
modifying  the  data  in  any  way  is  undesirable.  When  a  process  is  done  with 
the  data  obtained  by  pq^etO,  pq_unget()  is  invoked  to  release  it. 

Figure  4.7  shows  the  PBUFFER  structure.  Current  points  to  the  current 
data,  that  is  the  pdata  structure  put  by  the  most  recent  call  to  pq^JutO. 
When  a  PDB  is  initialized  current  is  null.  But  once  a  call  to  pq4)ut() 
has  been  made,  current  always  points  to  the  most  recent  data  so  that 
subsequent  calls  to  pq_get()  always  succeed. 

Re8erve_li8t  points  to  a  list  of  empty  pdata  structures.  A  call  to  pq_ 
reserve  ()  removes  the  first  structure  from  the  list  and  returns  it.  When 
an  area  is  freed  by  a  call  to  pq.xmgetO,  it  is  placed  on  the  reserve  list  if  it 
is  not  the  current  area  and  there  are  no  other  processes  using  it  (i.e.  there 
are  no  outstanding  calls  to  pq^etO  for  this  data). 

Name  points  to  an  ASCII  name  for  the  PDB  for  debugging  purposes. 
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/* 

PBUFFER  i«  the  structure  for  the  periodic  data  buffers. 

*/ 
typedef  struct  pbuffer 

{ 

PDATA*  current; 

PDATA*  reserve  J.i8t; 

char*  name; 

int  putjcount; 

int  getxount; 

int  lock; 

int  lock4)riority; 

int  processor; 

EVENT*  put  je  vent; 
}  PBUFFER; 

Figure  4.7:  Periodic  data  buffer  structure  (PBUFFER). 


Put_count  and  get.count  are  instrumentation  counters  which  count  the 
number  of  calls  to  pq^jutO  and  pq_get()  respectively. 

Lock  and  lock4)riority  form  the  pdb  locking  mechanism.  Each  of 
the  PDB  procedures  locks  the  PDB  before  manipulating  the  structure.  Lock- 
ing a  PDB  includes  raising  the  processor  interrupt  priority  on  the  processor 
preventing  interrupts  so  that  no  other  process  on  the  same  processor  can 
become  active  and  access  the  PDB.  Lock_priority  is  used  to  hold  the  pre- 
vious processor  priority  while  the  PDB  is  locked.  Locking  also  involves  a 
test-and-set  operation  on  the  lock  field  so  processes  on  other  processors  will 
wait  (in  a  busy-wait  loop)  until  the  locking  process  unlocks  the  PDB. 

Processor  is  the  code  for  the  processor  containing  the  PDB.  For  technical 
reasons  HIC  requires  that  a  PDB  tind  its  associated  data  are«is  reside  on  a 
single  processor  (i.e.  in  the  local  memory  for  that  processor). 

Put-event  points  to  an  event  associated  with  the  PDB.  If  put.event  is 
non-null  it  is  triggered  each  time  pq^JutO  is  invoked. 

Figure  4.8  shows  the  structure  of  periodic  data  areas  or  PDATA 's. 

Next  points  to  the  next  PDATA  when  this  area  is  in  the  reserve  list  of  a 
pbuffer. 

Inuse  is  a  counter  of  the  number  of  processes  currently  using  this  pdata. 
When  this  area  is  returned  by  pq.get()  inuse  is  incremented  and  when  it 
is  released  by  pq_imget(),  inuse  is  decremented.  If  inuse  is  zero  for  the 
current  area  in  a  call  to  pq^JutO  then  the  current  area  is  placed  on  the 
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A 

PDATA  i» 

the  structure  for  the  Periodic  Data  Area. 

•/ 

tjrpedef  8t 

{ 

struct 

ruct  pdata 

pdata* 

next ; 

tnt 

inuse; 

int 

event^id; 

int 

procedure  J.d; 

int 

tiae; 

int 

processor; 

char* 

data; 

}  PDATA; 

Figure  4.8:  Periodic  data  area  structure  (PDATA). 


reserve  list  before  the  pdata  passed  is  made  the  new  current  area.  Also 
during  pq_unget(),  if  the  pdata  being  released  is  no  longer  the  current  ajea 
and  its  count  is  zero  then  it  is  placed  on  the  reserve  list. 

Event_id,  procedure_id,  and  time  identify  the  time  and  source  of  the 
data  in  the  pdata.  Pq-putO  places  the  id's  of  the  current  event  and  proce- 
dure and  the  current  time  in  the  PDATA  before  it  is  made  the  current  one. 
Thus  a  process  getting  data  can  determine  if  the  data  is  stale  and  where  the 
data  originated  from. 

Processor  has  the  same  meajiing  as  in  the  PBUFFER  structure. 

Data  points  to  the  actual  data.  Typically,  a  PDATA  is  assigned  a  data 
area  when  it  is  initialized  and  it  never  changes.  However,  it  is  possible  to 
change  the  data  pointer.  It  must,  though,  point  to  the  same  size  data  area. 

Figures  4.9  through  4.12  contain  the  four  PDB  access  routines,  pq_get(), 
pq.ungetO,  pqjreserveO,  and  pq^JUtO.  The  routines  are  very  simple 
and  straightforward.  In  the  procedures,  DISABLE_INTERRUPTS  and  ENABLE. 
INTERRUPTS  are  C  macros  that  disable  and  enable  interrupts  on  the  proces- 
sor, respectively.  The  procedure  tas()  performs  the  test-and-set  operation 
on  the  indicated  location.  In  pq4)ut(),  the  procedures  hic.event  J.d()  and 
hic4)rocedure_id()  return  the  identifier  of  the  current  event  and  proce- 
dure respectively.  The  global  variable,  global->tiine  is  the  current  time. 
The  procedure  hicdispatchO  triggers  the  HIC  event  it  is  passed  as  its 
argument. 

A  major  consideration  behind  the  design  of  periodic  data  buffers  is  that 
they  not  cause  processes  to  block.    The  four  PDB  access  procedures  never 
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/♦ 

pq^et  (pq)  return!  a  pointer  to  the  most  current  data  t'n  the  buffer  indicated  by  pq. 

*/ 

PDATA* 
pq^et  (pq) 

PBUFFER*  pq; 

{ 
regiator  PDATA*  pdata  =  0; 

int  b; 

Lock  the  PDB.  //  there  is  a  most  current  buffer,  then  we'll  use  it.  And  increase  the 
in  use  counter.  Increment  the  get  counter.  Unlock  the  PDB. 

•/ 

DISABLEJITERRUPTS ; 

while  (taB(&:(pq->lock))    ==   1)     {} 

if  (pq—>  current) 

{ 
pdata  =  pq— >current; 
pdata— >inuBe++ ; 

) 

pq— >get  jcouiit++ ; 
pq— >lock  =  0; 
EHABLE-irrERRUPTS; 
return  (pdata)  ; 

} 


Figure  4.9:  Pdb  get  procedure,  pq^etO. 


block  (this  is  almost  true,  see  below)  however  pq_re8Grve()  Jind  pq_get() 
can  fail.  In  both  cases  if  they  fail  they  return  a  null  pointer.  Well  designed 
programs  will  always  check  the  returned  pointer  from  these  procedures.  Pq_ 
reserve  0  will  fail  if  there  is  no  buffer  available.  Pq^etO  fails  if  it  is  called 
after  the  PDB  has  been  initialized  and  before  any  buffer  has  been  made  the 
current  buffer. 

It  was  stated  earlier  that  pdb's  do  not  cause  events  to  block.  This  is 
not  really  true.  The  lock  field  in  a  PBUFFER  is  used  to  block  access  to  the 
PBUFFER  structure  itself.  A  PDB  is  a  Shared  resource  with  the  potential  for 
simultaneous  access;  so  some  protection  and  synchronization  is  required.  In 
fact,  this  is  the  only  place  in  Hic  itself  where  a  conflict  arises.  A  test-and-set 
protocol  allows  a  processor  accessing  a  PDB  to  lock  out  access  by  processes 
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A 

pq-unget  (pq,  pxlata)  terminates  use  of  the  periodic  buffer  pointed  to  by  pdata. 

V 

void 

pq  Jinget  (pq,  pdata) 
PBUFFER*  pq; 

POATA*  pdata; 

{ 
/' 

Lock  the  PDB.  Decrement  the  in  use  counter  for  the  buffer.  If  this  is  not  the  current 
buffer  for  pq  and  no  one  else  is  ustng  it,  then  put  pdata  on  the  reserve  list  for  pq. 
Unlock  the  PDB. 

V 

DISABLE  JHTERRUPTS ; 

while  (ta8(i:(pq->lock))    ==   l)     {} 

pdata— >iiiu8e ; 

if  ((pq— >c\irront  !=  pdata)    &&; 
(pdata— >inu8e  <=  0)) 

{ 
pdata— >next  =  pq— >re8er»eJ.i8t; 
pq— >reBerTe-li8t  =  pdata; 

} 
pq— >lock  =  0; 
EJIABLE-irrERRUPTS; 
} 


Figure  4.10:  Pdb  unget  procedure,  pq_imget(). 


on  other  processors.  A  locked  out  processor  spins  its  wheels  in  a  busy-wait 
loop  which  seems  wasteful  but  the  amount  of  time  a  pdb  is  actually  locked 
is  very  small.  In  particular,  the  time  spent  busy-waiting  is  much  less  thaji 
the  amount  of  time  it  would  take  to  suspend  the  process  and  enqueue  it 
while  waiting  for  the  lock  to  be  lifted.  On  the  other  hand  though,  a  heavily 
used  PDB  can  cause  delays  of  unpredicatable  length  since  there  is  no  queue 
of  waiting  processes  maintained  and  there  is  a  race  among  waiting  processes 
to  seize  a  PDB  when  it  is  unlocked.  Furthermore,  an  event  locks  a  PDB  it 
disables  all  interrupts  on  its  own  processor.  The  reason  for  this  is  to  prevent 
a  higher  priority  event  on  the  same  processor  from  attempting  to  access  the 
same  PDB  which  would  lead  to  deadlock.  This  simple  approach  is  costly  since 
all  higher  level  events  on  that  processor  are  locked  out,  not  just  events  that 
are  in  conflict.  And  since  an  event  locks  out  interrupts  before  performing 
the  test-and-set  the  lockout  can  spread  from  processor  to  processor.   Note 
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/• 

pq jreserve  (pq)  reserves  a  data  area  jot  the  buffer  indicated  by  pq.   A  pointer  to  the 
data  area  is  returned. 

V 

PDATA* 

pq  jreserve  (pq) 

register  PBUFFER*  pq; 

{ 
register  PDATA»  pdata  =   0; 

/* 

Lock  the  PDB.    Get  a  buffer  from  the  reserve  list,  if  there's  anything  there.    Unlock 
the  PDB. 

•/ 

DISABLE  JMTERROPTS ; 

while  (tas(&;(pq->lock))    ==   1)     {} 

if  (pq— >re8er»eJ.iBt) 

{ 

pdata  =  pq— >re8erveJ.i8t; 

pq— >reBeryeJ.i8t  =  pdata— >next; 
} 

pq->lock  =  0; 
EMABLEJIBTERRUPTS; 
return  (pdata)  ; 
) 

Figure  4.11:  Pdb  reserve  procedure,  pq-reserve(). 


that  deadlock  can  not  arise  in  this  situation,  since  the  locking  of  fob's  is 
restricted  to  the  four  access  procedures  (get,  put,  reserve,  and  unget)  and 
eaxh  of  these  accesses  only  one  PDB  at  a  time. 

The  question  arises,  why  use  pdb's  if  they  exhibit  such  undesirable  be- 
havior? The  answer  has  two  parts.  The  first  pait  of  the  answer  has  to  do 
with  the  expected  nature  of  the  communication  required.  A  typical  scenario 
is  a  single  process  periodically  producing  a  set  of  data  and  a  relatively  small 
number  of  consumers  who  need  to  examine  the  data  according  to  their  own 
periods.  Since  a  robot  controller  is  a  closed  system  (at  least  at  this  level)  all 
consumers  of  a  PDB  are  in  fact  known  to  the  system.  This  is  in  contrast  to 
the  request  queue  for  a  server  in  a  general  purpose  computer  system  where 
the  number  of  clients  is  in  potentially  unbounded.  The  problems  presented 
in  the  previous  paragraph  would  arise  if  several  of  the  consumers  were  syn- 
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A 

I>q4)ut(pq,  pdata)  put  the  data  pointed  to  by  pdata  on  the  periodic  buffer  indicated 

fry  pq- 
V 

Toid 

pq4)Ut  (pq,   pdata) 

PBUFFER*  pq; 

PDATA*  pdata; 

{ 

/* 

Ftrst,  clear  the  inuse  counter,  let  the  source  id's  and  time  stamp  for  the  buffer. 

*/ 

pdata— >inu8e  =   0; 
pdata— >eventj.d  =  hicjevent_id()  ; 
pdata— >procedure_id  =  hic4>rocedure_id  ()  ; 
pdata— >ti«e  =  global— >ti»e; 

/• 

Lock  the  PDB.  Check  the  most  current  pdata.  //  it's  not  tn  use  the  put  it  on  the 
re8erTeJ.ist.  Make  pvdata  the  most  current.  Increment  the  put  counter.  Unlock 
the  PDB. 

V 

DISABLE  JTHTERRUPTS; 

while  (ta8(i:(pq->lock))    ==   1)     {} 

if  (pq—> current  iik 

(pq— >current— >iini8e  ==  0)) 

{ 
pq— >c»irrent— >next  =  pq— >reBerveJ.iBt; 
pq— >re8erTe  J.iBt  =  pq— >curTent; 

) 

pq— >current  =  pdata; 
pq— >putjcount++; 
pq->lock  =  0; 
EHABLEJMTERRUPTS; 

/• 

//  there  is  a  put  event  for  the  buffer,  trigger  it. 

'/ 

if  (pq— >put  jevent) 

{ 
hicjdiBpatch(pq— >putjeTent)  ; 

} 
} 


Figure  4.12:  Pdb  put  procedure,  pq4)ut(). 
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chronized  in  such  a  way  that  they  happened  to  requested  the  same  data 
at  the  same  time  (this  is  not  unlikely  in  a  robot  controller  where  much  of 
the  activity  is  synchronized).  One  solution  would  be  to  stagger  slightly  the 
scheduled  times  of  the  using  processes. 

To  prevent  failures  in  pq_re8erve()  the  designer  of  a  system  must  ensure 
that  there  are  a  sufficient  number  of  pdata  structures  allocated  to  each  PDB. 
The  straight  forward  method  is  to  examine  each  procedure  that  accesses  a 
PDB  and  find  the  majdmum  number  of  pdata  structures  it  may  hold  at  one 
time,  sum  over  all  the  procedures  that  access  a  PDB  and  allocate  that  number 
of  structures.  In  a  properly  drawn  diagram  such  as  in  those  in  section  3  this 
is  the  same  as  counting  the  number  of  arrowheads  on  lines  into  and  from 
a  PDB.  It  is  also  necessary  to  consider  any  instrumentation,  monitoring, 
or  debugging  processes  that  might  access  pdb's,  pdata  structures  must  be 
allocated  for  these.  Allocating  this  number  of  structures  ensures  that  there 
will  be  no  failures  (excluding  trying  to  get  a  data  before  any  has  been  put  on 
a  PDB  amd  assunaing  there  are  no  bugs  or  system  failures  that  cause  pdata 
structures  to  be  lost). 

Merging  Pdb's.  Each  periodic  data  buffer  can  have  aji  event  associated 
with  it  that  is  triggered  whenever  a  put  operation  is  performed  on  the  PDB. 
An  example  usage  is  to  merge  pdb's.  Consider  figure  4.13  which  shows 
two  levels  of  control  similzo-  to  the  object  and  finger  servos  in  figure  3.3  on 
page  38.  First  note  that  the  diagrams  for  the  servo  themselves  have  been 
simplified,  the  finger  servo  box  now  represents  the  finger  servo  sequence 
and  the  associated  pdb's.  The  other  servos  are  similarly  simplified.  To 
put  the  figure  in  perspective,  consider  the  situation  where  the  four-fingered 
Utah/MIT  hand  is  grasping  and  manipulating  an  object  with  three  of  its 
fingers  while  the  fourth  finger  is  in  free  motion,  moving  perhaps  to  a  point 
on  the  object  where  it  will  help  establish  a  new  grip.  A  strategy  very  similar 
to  this  was  used  by  Maw-Kae  Hor  to  rotate  objects  with  a  planar  manipula- 
tor dubbed  the  Four  Finger  Manipulator  [Hor87]  (a  predecessor  of  Hic  was 
used  on  these  experiments).  Now  in  the  example,  the  object  servo  controls 
three  fingers  using  the  finger  position  and  forces  from  the  finger  servo  and 
producing  target  values  in  the  grasp  targets  PDB.  Independently,  the  free 
motion  servo  is  controlling  one  finger  again  using  the  finger  servo's  actual 
values  and  producing  free  targets  in  the  free  target  pdb.  The  merge  event 
is  associated  with  both  the  grasp  PDB  and  the  free  PDB  as  their  put  event 
and  is  triggered  each  time  data  is  put  on  either  of  the  pdb's.  It  then  takes 
the  most  recent  data  on  each  of  its  input  pdb's  and  produces  a  single  set  of 
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target  values  for  the  finger  servo's  target  PDB. 

This  method  adds  overhead  to  the  system  since  there  are  two  extra 
calls  to  both  pq_put()  and  pq^etO  and  the  merge  event  must  be  invoked 
and  data  copied  etc.  But  it  has  the  great  advantage  that  it  can  be  largely 
transparent  to  the  rest  of  the  system.  The  buffers  for  the  grasp  targets  and 
the  free  targets  are  identical  to  the  finger  servo  buffers,  thus  the  same  object 
servo  output  can  be  "plugged"  directly  to  the  finger  servo  for  tasks  with  no 
fingers  in  free  motion  and  likewise  if  all  the  fingers  are  in  free  motion  there 
is  no  need  for  the  object  servo  and  the  merging,  the  free  motion  output 
caji  communicate  directly  with  the  finger  servo.  Since  essentially  all  that  is 
involved  is  the  changing  of  pointers  this  sort  of  change  can  be  accomplished 
on  the  fly.  Referring  back  to  the  discussion  of  homogeneous  manipulations 
in  section  3.2,  this  is  precisely  the  sort  of  activity  that  might  occur  during 
the  transitions  from  one  homogeneous  task  to  another. 

Note  that  the  object  and  free  motion  servos  are  truly  independent.  They 
need  not  have  the  same  cycle  time  or  be  synchronized  in  any  way.  This  is 
why  the  merge  event  is  made  the  put  event  for  both  pdb's,  so  that  updating 
either  set  of  targets  causes  the  finger  servo's  targets  to  be  updated.  What 
happens  if  the  two  servos  produce  their  output  at  nearly  the  same  time? 
Suppose,  for  instance,  that  the  free  motion  output  follows  closely  behind  the 
object  servo  output.  Under  the  HIC  scheduler,  if  they  are  so  close  in  time 
that  the  merge  event  is  triggered  twice  before  it  begins  execution  then  the 
merge  event  will  only  execute  once  and  will  have  the  most  up-to-date  data 
from  each  source.  K  the  free  motion  output  comes  while  the  merge  event  is 
executing  it  will  be  performed  a  second  time  immediately  upon  completion. 
Thus  two  sets  of  targets  would  be  produced  in  rapid  succession  for  the  finger 
servo.  The  first  will  have  the  latest  targets  from  the  object  servo  and  slightly 
out-of-date  targets  from  the  free  motion  servo,  the  second  set  of  targets  will 
have  the  latest  targets  from  both.  There  is  an  assumption  in  this  example 
that  mixing  slightly  out  of  date  targets  with  fresh  targets  is  not  a  problem 
so  that  if  the  finger  servo  happens  to  pick  up  the  mixed  set  of  targets  it 
will  not  cause  the  task  to  fail.  If  the  finger  in  free  motion  is  suitably  far 
from  the  object  or  the  system  is  suitably  compliant  this  assumption  is  valid. 
If  the  task  or  system  were  sensitive  to  these  mixed  targets  then  some  form 
of  synchronization  would  be  required  between  the  object  tind  free  motion 
servos.  If  the  two  servos  run  at  the  same  rate  and  in  the  sjime  order  each 
cycle,  then  a  simple  synchronization  technique  would  be  to  associate  the 
merge  event  with  only  one  of  the  two  input  pdb's  (the  PDB  for  the  servo 
that  finishes  later  each  cycle).    This  way  the  merge  event  would  only  be 
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Figure  4.13:  Merging  two  pdb's 
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invoked  once  each  cycle  and  would  have  the  most  up-to-date  data  in  each 
of  its  input  pdb's. 

4.7     Piggybacking  HiC 

Hic  is  not  a  general  purpose  operating  system.  It  is  suitable  for  a  very  spe- 
cific part  of  reaJ-time  systems,  namely  the  servo  loops  for  low  level  control. 
At  present,  HIC  is  not  an  independent  operating  system,  it  runs  on  top  of 
MIT's  Condor  which  supplies  the  operating  system  nuts  and  bolts.  Hic  will 
always  run  as  one  part  of  a  larger  system. 

Figure  4.1  on  page  43  shows  the  hardware  configuration  used  to  control 
the  Utah/MIT  hand.  The  Sun  3/160  controls  the  system.  Programs  ane 
developed  on  the  Sun,  compiled,  and  downloaded  to  the  Ironies  processors 
which  run  the  Hlc/Condor  combination.  Current  plans  are  that  a  single 
processor  will  be  dedicated  to  the  raw  joint  servo  (see  section  3)  running  on 
HIC  on  top  of  Condor.  Similarly,  the  finger  and  object  servos  will  run  on 
other  processors  (or  perhaps  on  a  single  processor).  Higher  level  supervisory 
tasks,  are  less  likely  to  fit  the  HIC  form  and  will  require  more  sophisticated 
process  management  for  file  I/O  and  so  on,  though  they  will  still  have  real- 
time constraints.  One  or  more  of  the  Ironies  processors  will  be  used  for  these 
tasks  under  the  Sage  operating  system  [Salk88]. 

Sage  is  a  real-time  operating  system  for  supervisory  control  developed 
at  the  NYU  Robotics  Laboratory.  Compared  to  HIC,  Sage  is  much  more 
sophisticated.  It  provides  true  multi-tasking,  memory  management,  inter- 
process communication  and  synchronization  primitives,  as  well  as  support 
for  file  I/O  while  still  providing  real-time  response  where  needed.  It  thus  fills 
an  important  gap  between  the  very  low  level  functions  of  HIC  and  the  general 
capabilities  of  the  hosts  development  system.  A  scenario  one  might  imagine 
is  that  task  planning  (i.e.  deciding  what  is  to  be  done)  is  done  on  the  host 
processor.  This  involves  complex  decision  making  processes  and  intensive 
computation  and  to  some  extent  is  not  bound  by  real-time  constraints.  The 
complex  tasks  will  be  managed  by  supervisory  level  programs  running  on  the 
real-time  processors  (the  Ironies)  under  Sage.  These  processes  hcindle  the 
sequencing  and  transition  between  the  homogeneous  tasks  that  make  up  the 
complex  tasks.  And  finally,  the  homogeneous  tasks  are  handled  primarily 
by  HIC  tasks. 

A  variant  of  this  architecture  is  to  have  HIC  run  as  a  single  task  under  the 
control  of  another  operating  system  such  as  Sage.   Since  HIC  requires  only 
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a  single  stack  it  can  run  completely  within  a  Sage  process.  This  would  be 
useful  in  a  smaller  system  where  a  single  processor  is  sufficient  for  both  the 
low  level  servos  and  the  supervisory  control.  The  HIC  task  would  run  at  a 
high  priority  to  ensure  the  mc  tasks  met  their  deadlines  and  the  supervisory 
tasks  would  essentially  run  in  the  background. 

In  both  of  these  scenarios  the  primary  method  of  communication  between 
Hlc  tasks  and  other  tasks  is  via  periodic  data  buffers.  Since  PDB  access  never 
blocks  it  can  be  easily  implemented  within  any  operating  system. 

Process  blocking  is  a  factor  here  also.  If  HIC  is  running  on  top  of  Sage 
3Lnd  one  of  the  HIC  events  accesses  a  Sage  facility  it  may  block.  In  this 
case  the  entire  HIC  system  will  block  since  HIC  appears  to  Sage  as  a  single 
process.  There  is  nothing  inherently  wrong  with  this  but  the  programmer 
must  remember  that  the  all  of  the  Hic  events  are  also  blocked. 

4.8     Benchmarks 

Some  benchmarks  were  run  on  a  HIC  system.  The  processor  was  a  Motorola 
68020  running  at  16.7  Mhz.  The  timings  were  made  on  a  system  where 
nothing  else  was  happening  so  that  they  tend  to  represent  the  best  case 
values. 

The  time  to  perform  pq_put()  and  pq-get()  (in  a  sense  the  time  to 
send  a  message)  is  74.7  /isec.  In  most  PDB's  there  are  several  pq_get()'s 
performed  for  each  pq4)ut()  so  the  cost  of  putting  the  item  on  a  PDB  can 
be  spread  over  the  several  "gets".  The  time  for  pq^etO  by  itself  is  24.9 
/isec. 

Pq_put()  and  pq^etO  are  not  used  in  isolation.  They  are  part  of  a 
cycle  of  routines  for  managing  PDB's.  The  complete  cycle  of  pq_reserve(), 
pq4)ut(),  pq_get(),  and  pqjongetO  takes  130.8  fisec. 

Finally,  the  time  to  invoke  hicjdispatchO  (essentially  the  time  for  a 
context  switch  is  47.4  /isec.  This  is  the  overhead  in  triggering  an  event. 
This  plus  the  basic  time  to  respond  to  an  interrupt  is  also  the  overhead  for 
hardware  interrupts. 


Chapter  5 
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Jx  NETWORK,  such  as  GANGLIA,  which  intends  to  support  reaJ-time  pro- 
gramming in  turn  has  the  systems  real-time  constraints  imposed  on  it  in 
the  form  of  time  constraints  on  the  messages.  Ganglia  attempts  to  deliver 
every  message  on  schedule.  The  primary  method  used  is  to  use  a  determin- 
istic schedule  computed  off-line  for  the  bulk  of  the  time  constrained  traffic, 
the  servo  loop  messages.  Ganglia  also  does  not  ensure  that  messages  are 
delivered  intact.  Attempts  to  ensure  the  delivery  of  messages  requires  that 
any  message  may  have  to  be  transmitted  an  arbitrary  number  of  times.  This 
introduces  uncertainty  into  the  schedule  which  goes  against  the  desire  for 
predictable  scheduling.  In  addition,  much  of  the  traffic  in  a  robot  controller 
is  not  overly  sensitive  to  lost  messages  (see  the  discussion  of  servo  loops  in 
the  Introduction)  so  it  is  appropriate  for  ganglia  to  provide  an  unreliable 
data-gram  service  instead  of  a  guaranteed  service.  There  are  two  reasons 
why  deterministic  scheduling  is  feasible  within  a  robot  controller:  firstly, 
it  is  a  closed  system  in  that  the  network  itself  is  not  dynamically  modified 
and  secondly,  much  of  the  activity  is  very  regular  and  predictable,  primarily 
because  of  the  servo  loops  used  in  the  low-level  control.  Thus  for  a  given 
system  a  suitable  schedule  for  the  network  traffic  can  be  computed  off-line 
and  compiled  into  the  system.  Of  course,  unexpected  emergencies  or  ex- 
ceptional conditions  can  not  be  planned  for  in  an  off-line  scheduler,  instead 
ganglia  is  careful  about  when  exceptional  conditions  can  be  raised  within 
the  system  and  then  immediately  adjusts  the  network  traffic  as  needed  to 
best  respond  to  the  exception.  This  can  be  likened  to  a  typical  processors 
response  to  hardware  interrupts.  Interrupts  are  only  permitted  between  in- 
structions when  the  processor  is  in  a  "simple"  and  restorable  state  and  then 
the  interrupt  is  handled  by  switching  to  a  pre-planned  routine  which  is  to 
appease  the  interrupting  device  and  return  to  the  interrupted  program  flow 
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as  soon  as  possible. 

5.1     Inter-process  Communication. 

An  important  discriminating  characteristic  of  robot  control  systems  is  the 
types  of  inter-process  communication  available  or  employed.  The  typical 
architecture  for  robot  controllers  is  multiple  computers  or  processors  on  a 
common  bus.  The  major  components,  processors  and  devices,  are  connected 
to  the  bus  and  most  communication  is  via  shared  memory  although  other 
communication  formalisms  are  often  placed  over  the  shared  memory.  Fre- 
quently, though  not  universally,  multiple  ta^ks  run  on  each  of  the  processors 
and  these  tasks  control  the  robot.  For  the  most  part  the  tasks  are  statically 
allocated  to  the  processors  and  do  not  migrate  among  the  processors  and 
often  they  are  neither  killed  nor  spawned  but  are  created  at  load  time  and 
exist  for  the  duration  of  the  system.  This  static  nature  of  processes  or  tasks 
is  a  reflection  of  the  closed  nature  of  robot  control  systems  and  can  simplify 
inter-process  communication. 

"Messages  passed  on  queues"  is  a  common  inter-process  communication 
paradigm  and  is  used  in  some  robot  control  systems.  Harmony  [Gent84]  uses 
it  extensively  while  NRTX  [Kapi84]  and  SAGE  [Salk88]  support  it.  Harmony 
constructs  aji  entire  real-time  programming  style  axound  message  passing 
[Gents  1].  Great  care  is  taken  so  that  processes  do  not  block  unexpectedly. 
For  instance  the  sending  process  (i.e.  the  process  initiating  an  exchange) 
supplies  all  the  buffer  area  needed  for  the  complete  message  exchange  so 
that  no  process  need  access  a  buffer  or  message  pool  since  an  empty  pool 
would  require  that  the  process  be  blocked.  The  sending  process  blocks  until 
the  processing  of  its  message  is  completed  by  a  reply  from  the  destination 
process.  A  receiving  process  blocks  only  if  there  are  no  messages  for  it  to 
handle.  Eliminating  unexpected  blocks  removes  much  uncertainty  about 
how  long  it  will  take  to  process  a  request  for  a  task  so  that  the  programmer 
can  construct  the  task  to  meet  critical  deadlines.  However,  queues  usually 
mean  some  uncertainty  in  time  since  there  may  be  several  messages  ahead 
of  a  task's  message. 

A  common  form  of  inter-process  commurucation  is  simply  signaling  an 
event  with  little,  if  any,  additional  data.  An  example  from  a  time  sharing 
system  of  this  type  of  communication  is  signals  in  UNIX  [Kern84]  which  are 
conceptually  similar  to  hardware  interrupts.  NRTX,  which  is  a  descendent  of 
UNIX,  provides  such  signals.  SAGE  and  Condor  [Sieg85][Sieg86],  a  low  level 
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operating  system  for  the  Utah/MIT  hand,  provide  signaling  via  mailbox  in- 
terrupts by  which  one  processor  can  cause  an  interrupt  on  another  processor 
(or  to  itself)  and  tramsmit  a  single  word  of  data  in  the  process.  The  appeal 
of  signals  is  that  for  many  types  of  events  asynchronous  signaling  that  the 
event  has  occurred  is  all  that  is  needed  and  the  efficient  implementation  of 
signals  can  simplify  the  programming  around  such  events. 

Process  synchronization  is  a  crucial  form  of  inter-process  communica- 
tion. How  synchronization  is  presented  to  the  programmer  is  another  dis- 
tinguishing characteristic  of  real-time  operating  systems.  In  Harmony  syn- 
chronization is  implicit  within  the  message  passing  since  the  sending  of  a 
message  is  not  complete  until  the  receiver  acknowledges  it.  In  NRTX  and 
SAGE  synchronization  is  available  in  several  forms  including  messages,  sig- 
nals, test-and-set  instructions  and  semaphores.  Condor  does  not  provide 
synchronization  directly  but  the  synchronization  facilities  of  the  underlying 
hardware  are  available,  i.e.  the  test-cind-set  instruction.  In  fact,  process 
blocking  is  virtually  impossible  in  Condor  because  a  single  stack  is  used  for 
all  processes  on  a  processor.  NYMPH  [Chen86]  another  low  level  operating 
system  provides  facilities  for  multiple  processors  to  participate  in  synchro- 
nization. In  fact,  this  is  about  the  only  service  NYMPH  provides.  Modern 
micro-processors  and  busses  support  inter-processor  synchronization  with 
a  test-and-set  or  similar  operation  that  eliminates  the  ambiguities  possible 
with  simultaneous  references  to  a  common  variable  by  multiple  processors. 

As  mentioned  previously,  the  typical  robot  control  system  is  implemented 
on  multiple  micro-processors  on  a  common  bus  and  in  these  systems  shared 
memory  is  the  underlying  communication  medium.  For  all  the  mentioned 
systems  except  Harmony  shared  memory  is  the  primary  method  of  passing 
all  but  small  amounts  of  data  and  even  in  Harmony  shared  memory  facili- 
tates the  implementation  of  its  message  passing  primatives.  This  architec- 
ture has  great  appeal:  construction  of  the  controller  is  easy,  using  standard 
busses  and  boards;  the  system  can  be  quite  moduljir  and  expandable  for  the 
same  reason;  shared  memory  is  a  reliable  and  fast  communication  medium; 
it  is  conceptually  simple  and  flexible  for  the  programmer;  and  other  com- 
munication formalism  can  be  implemented  on  top  of  shared  memory. 

Ganglia  is  a  suitable  communication  base  for  supporting  all  of  these 
forms  of  communication,  including,  to  some  extent,  shao-ed  memory.  It 
should  be  noted  that  the  GANGLIA  architecture  does  not  specify  what  operat- 
ing system  is  running  on  the  individual  nodes  and  a  GANGLIA  system  is  likely 
to  have  a  heterogeneous  collection  of  operating  systems  on  the  nodes.  Con- 
sequently, GANGLIA  trys  to  support  a  very  broad  range  of  communication 
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styles  in  an  efficient  manner.  To  this  end  GANGLIA  implements  a  network- 
wide  global  memory.  There  is  an  address  space  common  to  all  the  nodes  on 
the  network.  A  node  writes  to  this  address  space  by  putting  the  data  on 
the  network  along  with  the  address  of  the  data  and  all  the  nodes  then  have 
access  to  the  data.  The  memory  is  not  implemented  as  a  separate  entity  on 
the  network  but  each  node  maintains  the  portion  of  the  memory  in  which  it 
is  interested.  Parts  of  the  memory  are  thus  duplicated  in  various  nodes  of 
the  network.  This  implements  an  approximation  to  a  shared  memory  on  the 
network  which  is  deficient  in  two  main  regards,  when  compared  to  memory 
on  a  common  bus.  Firstly,  the  delay  in  communication  which  means  that 
changes  to  the  memory  are  not  "instantaneous"  and  thus  a  cause  for  concern 
in  time  critical  processes.  Secondly,  is  because  the  memory  is  duplicated  on 
the  various  nodes  along  with  unavoidable  communication  failure  and  delay 
the  global  memory  may  become  at  times  be  inconsistent,  that  is  different 
nodes  seeing  different  values  in  the  memory.  Many  of  the  details  of  the 
GANGLIA  design  are  intended  to  contain  or  minimize  these  deficiencies.  To 
facilitate  inter-process  and  inter-processor  signaling  the  processor  in  a  node 
may  request  to  be  interrupted  when  a  particular  item  of  data  appears  on  the 
network  (i.e.  when  the  datum  is  changed).  And  to  provide  synchronization 
between  processors  GANGLIA  implements  at  a  low  level  a  message  acknowl- 
edgement scheme  and  a  test-and-set  message.  Again,  because  of  potential 
communication  failure  these  are  not  as  reliable  as  similar  facilities  between 
processors  on  a  common  bus  but  with  a  reliable  communication  medium  the 
GANGLIA  implementations  should  provide  useful  and  efficient  methods  for 
process  synchronization. 

Two  additional  characteristics  of  robot  control  programs  affect  the  com- 
munication. Both  are  artifacts  of  servo  loops.  First  is  the  periodicity  of 
servo  loop  messages.  Servo  loops  run  within  strict  intervals  and  the  com- 
munication traffic  associated  with  them  will  follow  similar  patterns.  The 
communication  scheduler  can  take  advantage  of  this  since  the  period,  tim- 
ing, and  communication  tr2jRc  for  the  servo  loops  are  known  in  advance. 
The  second  characteristic  is  that  much  of  the  servo  loop  traffic  is  repeated 
reporting  of  continuously  varying  values  such  as  joint  position  or  force  sensed 
by  an  end  effector.  The  importance  of  this  for  the  communication  system  is 
that  an  occasional  lost  message  is  not  likely  to  be  disastrous.  This  assertion 
is  reasonable  since  continuous  values  are  not  likely  to  change  much  in  one 
cycle  so  that  the  previous  value  cjoi  be  used  instead  of  the  lost  value  and 
since  an  updated  value  will  be  received  in  the  next  cycle  the  servo  can  re- 
cover during  the  next  cycle.  The  consequence  is  that  many  servo  messages. 


70    ^ GANGLIA 


very  likely  the  most  frequent  type  of  message  on  the  network,  do  not  require 
acknowledgement,  assuming  that  the  underlying  communication  is  reliable 
enough. 

The  rest  of  this  chapter  describes  GANGLIA  in  detail.  First  we  look  at 
the  physiccil  aspects  of  the  network,  the  medium,  the  data  link  layer  and  the 
node  architecture.  The  rest  of  the  chapter  presents  the  ganglia  protocol. 
Chapter  6  describes  the  communication  memory  management  unit  (CMMU) 
which  is  the  central  component  in  the  ganglia  architecture.  It  is  useful  to 
keep  in  mind  excimples  of  ganglia  systems.  A  simple  system  could  consist 
of  a  six  degree  of  freedom  robot  arm  with  a  complex  end-effector,  such  as  the 
Utah/MIT  hand  [Jaco84].  It  is  important  to  remember  that  the  controller 
for  the  robot  is  distributed  around  the  robot  so  that  the  processor  controlling 
the  actuator  for  a  joint  may  be  in  a  different  node  than  the  position  sensor 
for  the  same  joint.  The  end-effector  need  not  be  a  manipulator  but  could  be 
a  sensor,  for  instance  an  image  processing  system,  and  again  at  least  some  of 
the  processing  of  the  sensory  data  is  done  near  the  sensor.  Also  the  problem 
is  not  very  interesting  if  the  arm  is  simply  a  positioning  device  for  the  end 
effector  so  it  is  assumed  that  there  is  real-time  interaction  between  the  end 
effector,  its  task,  and  the  arm. 

The  following  chapter  describes  the  Communication  Memory  Manage- 
ment Unit  in  considerable  det«iil.  Then  chapter  7  presents  an  analysis  of 
ganglia's  communication  characteristics. 

5.2     The  Physical  Layer. 

The  physical  layer  for  GANGLIA  is  not  yet  specified  but  some  characteristics 
caji  be  stated.  The  medium  is  to  be  a  high  speed  serial  bus.  The  use 
of  a  parallel  bus,  say  eight  bits  wide,  ha*  been  considered  but  at  present 
it  is  assumed  that  a  serial  bus  is  used.  The  bandwidth  of  the  network  is 
yet  to  be  determined.  Most  certainly,  all  the  initial  experiments  may  be 
performed  on  twisted  pair  or  coaxial  cables  with  data  rates  of  around  10 
Megabits.  Preliminary  calculations  indicate  that  this  should  be  sufficient 
for  networks  of  a  few  dozen  nodes.  To  support  networks  with  hundreds  of 
nodes  a  different  technology,  perhaps  fiber-optics,  will  be  necessary.  It  is 
intended  that  a  GANGLIA  network  be  restricted  to  a  "single"  robot,  thus  the 
network  does  need  not  be  very  long.  A  maximum  length  of  100  meters  is 
considered  to  be  adequate.  Unlike  most  local  area  networks  the  cable  for 
a  GANGLIA  network  is  likely  to  be  subject  to  frequent  flexing,  for  instance, 
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consider  the  section  of  a  cable  passing  an  arm's  elbow.  Selecting  a  cable 
able  to  withstand  this  abuse  may  over-shadow  all  other  constraints  on  the 
physical  layer. 

5.3  Data  Link  Layer. 

The  data  link  layer  is  not  considered  in  detail  in  this  thesis.  The  require- 
ments are  good  reliability  with  low  overhead.  Messages  in  a  GANGLIA  net- 
work will  be  small  so  that  the  overhead  per  message  has  a  large  effect  on  the 
network's  effective  throughput.  Detection  of  transmission  errors  is  impor- 
tant so  that  some  form  of  redundancy  check  is  necessary.  Finally,  there  is 
the  question  of  collision  detection.  Currently  it  is  assumed  that  a  transmit- 
ting node  can  detect  a  collision  with  another  message  and  then  terminate 
transmission.  It  is  very  possible  that  this  is  not  needed  at  all  and  the  ability 
to  detect  collisions  can  be  removed  if  other  considerations  indicate  it  to  be 
too  expensive. 

5.4  Node  Architecture. 

A  block  diagram  of  a  GANGLIA  node  is  shown  in  figure  5.1.  There  are  five  ma- 
jor components,  the  device  interface,  the  processor,  the  processor's  memory, 
the  communication  memory,  and  the  communication  memory  management 
unit.  The  device  interface  is  the  node's  connection  to  the  device  or  devices 
it  controls,  most  likely  some  D/A  or  A/D  converters.  Some  nodes  are  com- 
putational nodes  and  may  not  be  connected  to  any  device.  The  processor 
controls  the  node.  The  memory  is  simply  the  processor's  local  memory.  The 
communication  memory  is  where  messages  are  stored  into  and  transmitted 
from.  It  is  the  portion  of  the  global  memory  (see  section  5.1)  maintained 
locally.  Finally,  the  communication  memory  management  unit  (CMMU) 
maintains  the  communication  memory  emd  is  the  node-level  interface  to  the 
network.  The  CMMU  is  described  in  detail  in  chapter  6. 

5.5  Protocol 

For  discussion  of  the  network  protocol  a  simple  abstract  example  system 
will  be  used.  It  is  shown  in  figure  5.2.  Node  C  is  the  control  node  which 
will  be  described  later.  Node  A  is  a  higher  level  node  or  perhaps  a  link 
to  a  host  computer  running  the  high  level  control  program.  The  nodes  X, 
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Y,  and  Z  are  sensory  nodes  consisting  of  a  processor  connected  to  some 
Analog-to- Digital  converters.  The  remaining  nodes,  M  and  N ,  are  actuator 
nodes.  As  with  the  sensory  nodes  they  contain  a  processor  and  the  necessary 
electronics  to  drive  an  actuator. 

5.5.1     Messages. 

This  section  describes  the  structure  of  GANGLIA  messages.  Table  5.1  de- 
scribes the  fields  of  a  ganglia  message  excluding  any  header  or  trailer 
information  for  the  data  link  frame. 

Message  taxonomy.  Before  describing  ganglia  messages,  we  will  char- 
acterize the  types  of  messages.  First  messages  fall  into  one  of  two  categories 
based  on  their  regularity. 

•  Periodic  messages  happen  at  regular  known  intervals.  This  class  is 
important  because  the  communication  network  can  take  advantage 
of  their  periodicity  to  improve  performance.  Periodicity  is  most  im- 
portant when  the  frequency  is  high  so  that  low  frequency  periodic 
messages  may  sometimes  be  considered  as  sporadic  messages. 

•  Sporadic  messages  occur  randomly,  from  the  standpoint  of  the  network 
scheduler.  There  is  very  little  the  network  can  do  to  prepare  for  these 
messages  except  perhaps  on  a  statistical  basis. 

Messages  fall  into  four  categories  based  on  their  relation  to  the  state  of 
the  system. 

•  Servo  messages  are  typified  by  messages  containing  raw  sensor  values 
or  actuator  commands.  They  are  high  frequency  periodic  messages 
that  are  critical  to  the  system's  integrity  and  as  such  make  up  a  back- 
ground traflRc  that  must  always  run.  If  this  traffic  stops  then  control 
of  the  robot  in  any  real  sense  has  been  lost.  In  current  robot  systems 
actuators  and  sensors  are  usually  tightly  coupled  to  the  processor  that 
controls  them  at  this  level  so  that  loss  of  control  at  this  level  is  un- 
likely. In  a  ganglia  system,  however,  it  might  well  be  that  actuators 
and  their  controlling  processors  communicate  over  the  network  so  that 
disruption  of  the  net  may  disable  the  most  basic  control. 

A  variant  of  servo  messages  are  instrumentation  messages,  messages 
used  to  record  certain  values  for  later  analysis  or  for  operator  feedback. 
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These  messages  may  be  periodic  and  of  high  frequency  like  the  servo 
messages  but  they  are  not  as  critical  to  the  system  integrity. 

•  Steady  state  messages  are  likely  to  be  periodic  but  at  a  lower  frequency 
than  the  above  messages.  Their  main  role  is  to  maintain  some  globaJ 
state  of  the  system.  Consider  an  arm  grasping  an  object  and  following 
a  specified  trajectory  in  free  space.  One  would  expect  periodic  mes- 
sages that  ensure  that  the  grasp  is  properly  maintained  and  that  the 
progress  along  the  path  is  appropriate  or  indicate  that  the  path  should 
be  changed. 

•  Changing  state  messages  indicate  or  cause  significant  modifications 
in  the  system  state.  Their  source  is  a  point  higher  in  the  control 
hierarchy  and  are  typically  sporadic.  Their  significance  is  that  the 
communication  network  may  be  able  to  change  its  characteristics  to 
better  suit  the  changing  state.  Using  the  above  example,  when  the 
object  contacts  an  anticipated  surface  then  one  could  predict  a  flurry 
of  communication  activity  while  the  robot  system  changes  from  the 
free  motion  state  to  the  constrained  motion  state.  The  interim  state 
may  be  relatively  long  in  duration  while  parameters  are  calculated 
and  loaded  and  processes  are  started  and  stopped.  During  this  period 
the  communication  traffic  may  be  considerably  different  than  during 
the  free  motion  state  auid  then  the  traffic  pattern  for  the  constrained 
motion  state  may  be  different  still. 

•  Exception  messages  are  similar  to  hardware  interrupts  or  UNIX  signals 
in  their  effect  on  the  system  state.  The  messages  must  have  high 
priority.  They  are  likely  to  precipitate  a  lot  of  network  traffic  and 
require  special  attention.  As  mentioned  earlier,  ganglia's  approach 
is  to  field  these  messages  at  specific  times  that  minimize  disruption  to 
the  network  schedule  and  then  to  aid  in  the  resolution  of  the  exception 
and  the  return  to  "normal"  operation. 

Frequency  and  priority.     In  the  above  categories  one  would  expect  that 
the  frequency  of  messages  would  be  as  follows,  in  decreasing  order: 

•  Servo 

•  Steady  state 

•  Changing  state 

•  Exception 
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With  servo  messages  being  the  most  frequent  and  exceptions  being  very 
rare,  hopefully. 

Priority  on  the  other  hand  should  be  assigned  as  follows,  in  decreasing 
order: 

•  Servo 

•  Exception 

•  Steady  state 

•  Changing  state 

The  relative  priority  of  steady  state  and  changing  state  messages  is  arguable 
but  the  significance  of  ranking  the  servo  messages  higher  than  the  exceptions 
is  that,  as  mentioned  above,  disruption  of  the  servo  messages  may  mean  total 
loss  of  control  which  would  complicate  or  prevent  an  appropriate  response 
to  the  exceptional  condition.  An  assumption  here  is  that  the  servo  traffic 
is  not  so  dense  that  an  exception  cannot  be  raised  until  it  is  too  late.  If 
such  a  situation  arises  the  network  is  clearly  overburdened  and  a  significant 
restructuring  of  the  network  and  control  system  is  called  for. 

Named  messages.  Messages  in  GANGLIA  are  not  addressed  to  particular 
destinations.  Instead  each  message  has  in  its  header  the  name  of  the  data  in 
the  message,  these  are  called  named  messages.  In  the  example  (figure  5.2) 
consider  the  node  A'  which  monitors  and  reports  some  sensors,  say,  for  joint 
positions  for  the  joints  m  and  n.  Periodically  this  node  must  transmit  a 
message  or  messages  containing  the  current  positions  to  nodes  M  and  A^ 
which  calculate  corrections  to  the  position  actuators  based  on  these  sensed 
positions  and  then  adjust  the  joints.  On  a  typical  broadcast  network  the 
straightforward  approach  would  be  to  transmit  two  messages  with  the  ad- 
dress of  the  destination  node  (M  in  one,  N  in  the  other),  some  indication 
of  what  data  is  in  the  message  (typically  a  communication  port  or  process 
number),  and  the  actual  data  (the  positions  of  joints  m  and  n).  In  a  network 
with  multicast  capabilities,  a  single  message  could  be  transmitted  contain- 
ing the  multicast  address,  some  indication  of  the  data,  and  the  data  itself. 
Both  A/  and  A'^  will  receive  the  single  message.  In  ganglia  a  single  message 
is  transmitted  containing  the  name  of  the  data  ("Joint  positions  for  m  and 
n")  and  the  data  itself.  Every  node  on  the  network  receives  the  message  and 
decides  whether  or  not  it  is  interested  in  the  joint  positions,  if  so  the  value 
is  stored  otherwise  the  data  portion  of  the  message  is  discarded.  Clearly, 
if  the  processor  in  each  node  must  examine  every  message  and  decide  if  it 
is  of  interest  then  a  great  deal  of  processing  power  will  be  wasted  on  the 
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communication.  The  CMMU  in  each  node  relieves  the  processor  of  this 
burden. 

In  table  5.1  the  data's  "name"  is  contained  in  four  fields.  Name  is  the 
data's  address  in  the  global  memory.  Length  is  the  length  of  the  data  item, 
the  majcimum  is  256  bytes.  This  also  corresponds  to  the  page  size  in  the 
global  memory.  Source  is  the  identifier  of  the  source  node.  It  is  included 
as  part  of  the  name  so  that  if  a  data  item  has  more  than  one  source  the 
receiving  nodes  can  determine  the  source.  Time  is  a  time  stamp.  When 
the  message  is  created  the  time  is  recorded  and  then  transmitted  with  the 
message.  This  can  be  used  to  detect  inconsistencies  in  the  global  memory 
and  to  detect  data  that  has  gone  stale. 

Addressed  messages.  It  should  be  noted  that  the  named  message  scheme 
can  support  normal  "addressed"  message.  AH  that  is  necessary  is  to  assign 
one  page  of  the  global  memory  to  each  of  the  nodes.  Then  messages  intented 
for  a  particular  node  are  sent  to  the  page  assigned  for  that  node.  Of  course, 
each  node  must  be  prepared  to  receive  data  for  its  page.  The  significance 
of  simulating  addressed  messages  is  that  the  named  messages  scheme  does 
not  preclude  any  acknowledgement  or  synchronization  protocols  available  in 
typical  networks. 

The  token.  The  first  two  fields,  next  and  thread,  make  up  the  token. 
Each  message  contains  a  token  that  indicates  which  node  is  to  transmit 
next  and  a  hint  as  to  what  should  be  transmitted.  Thus  each  message 
grajits  access  to  the  network  to  the  node  in  the  token.  A  node  receiving 
the  token  is  obliged  to  transmit  a  message  immediately,  otherwise  it  will  be 
assumed  that  the  token  has  been  lost  and  corrective  ax;tion  will  be  taken. 
Thread  is  an  indication  of  what  action  is  expected  of  the  node  in  next. 
In  general,  when  a  node  receives  the  token  it  may  have  several  messages 
ready  to  transmit,  thread  suggests  which  message  should  be  transmitted. 
Section  5.5.3  explains  this  in  more  detail. 

Flags.  The  flags  field  in  table  5.1  contains  various  flags  and  codes  re- 
lating to  the  message.  The  EMPTYJIESSAGE  flag  indicates  that  the  mes- 
sage contains  only  the  header,  there  is  no  data  with  the  message.  The 
EXCEPTION  flag  indicates  that  the  message  is  an  exception  message.  The 
ACKNOWLEDGEMENTJIEQUESTED  flag  indicates  that  the  receiving  node  should 
reply  with  an  acknowledgement  message.  The  MESSAGE-TYPE  is  a  code  for 
the  type  of  message.  The  EXCEPTIONiEVEL  indicates  the  level  of  exception 
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processing.  Zero  is  the  normal  level  so  a  message  with  level  0  is  a  normal  data 
transfer  message.  This  is  much  like  the  interrupt  level  found  in  most  micro- 
processors (section  5.5.4).  A  message  with  level  0  and  the  EMPTYJIESSAGE 
flag  set  is  considered  a  NULLJIESSAGE  and  contauns  simply  the  header.  It 
is  used  to  simply  pass  the  token  on,  primarily  by  the  Control  Node.  The 
OVERRUN  type  is  used  by  a  node  that  receives  the  token  when  it  should  have 
a  message  to  transmit  but  doesn't.  The  node  can  not  wait  for  the  message 
to  be  made  ready  so  it  transmits  an  empty  message  with  type  OVERRUN.  The 
EXCEPTION_POLL  is  used  to  poll  for  exception  messages  (see  section  5.5.4). 
IDLE-NETWORK  is  used  by  the  control  node  to  indicate  that  it  expects  the 
network  to  be  idle  following  this  message.  Usually,  an  idle  network  indicates 
that  the  token  has  been  lost  and  recovery  procedures  should  be  initiated. 
The  remaining  types,  CLEAR_TESTJVND.SET,  TESTJVND_SET,  TEST_AND.SET- 
REPLY  (0),  TEST_ANDJSET_REPLY  (1)),  POSITIVE-ACKNOWLEDGEMENT,  and 
NEGATIVE-ACKNOWLEDGEMENT  are  used  for  low  level  synchronization  between 
the  nodes;  see  section  6.2.2. 

5.5.2     Access  control. 

On  a  GANGLIA  network  access  to  the  network  is  tightly  controlled.  The 
control  aode  (node  C  in  figure  5.2)  is  primzo-ily  responsible  for  distributing 
access.  In  a  larger  sense  the  communication  node  is  responsible  for  the  in- 
tegrity of  the  network.  It  is  responsible  for  detecting  that  the  token  is  lost 
and  then  initiating  recovery.  It  is  an  importaait  part  of  the  priority  scheme 
that  filters  out  less  important  messages  during  the  system's  response  to  ex- 
ceptional conditions.  It  is  also  responsible  for  switching  from  one  scheduling 
plan  to  another  as  the  systems  demands  change. 

Communication  cycles.  The  existence  of  servo  loops  is  one  of  the  dom- 
inant charau:teristics  of  robot  control  programming.  Likewise  servo  loop 
communication  plays  a  dominant  role  on  the  network.  Under  normal  oper- 
ation, communication  on  the  network  is  divided  into  communication  cycles. 
Communication  cycles  occur  at  a  rate  which  is  a  common  multiple  of  the 
various  servo  loop  rates.  During  any  one  cycle  some  set  of  the  nodes  will 
have  servo  messages  (see  section  5.5.1)  to  transmit  and  some  nodes  will 
have  steady  state,  chatnging  state  messages,  or  exception  messages  to  trans- 
mit. The  control  node  knows  to  a  large  extent  what  messages  are  expected 
during  any  cycle.  In  particular  it  knows  what  servo  messages  will  be  trans- 
mitted during  a  particular  cycle.  It  may  know  what  steady  state  messages 
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will  be  transmitted  in  any  particular  cycle.  This  scheduling  information  is 
generated  when  the  system  is  compiled  and  placed  in  tables  in  the  control 
node. 

A  typical  communication  cycle  contains  the  following  steps: 

1.  The  control  node  sends  out  begin  servo  message  which  marks  the  be- 
ginning of  a  communication  cycle  and  passes  the  token  to  the  first  of 
the  nodes  to  transmit  a  servo  message. 

2.  The  indicated  node  transmits  its  data  in  a  message  that  indicates 
the  second  node  to  participate,  which  in  turn  passes  the  token  to  the 
third  node  and  so  on.  Each  node  knows  which  node  is  to  foUow  from 
the  information  generated  at  compile  time  for  the  control  node.  A 
sequence  of  nodes  transmitting  messages  without  intervention  by  the 
control  node  is  called  a  thread.  The  thread  field  in  the  header  of  the 
messages  indicates  the  current  thread  and  indicates  to  a  node  receiving 
the  token  which  node  is  to  receive  the  token  next  by  indexing  a  table 
in  the  CMMU.  The  final  node  in  the  servo  thread  passes  the  token 
back  to  the  control  node.  It  is  possible  that  severaJ  servo  threads  may 
be  used  in  a  particular  cycle.  When  a  thread  is  complete,  the  control 
node  begins  the  next  thread  by  passing  the  token  to  the  first  node  in 
that  thread. 

3.  Following  the  last  servo  thread  the  control  node  then  transmits  an 
exception  poll  message.  This  message  indicates  to  all  the  nodes  that 
exceptions  can  be  raised.  After  transmitting  the  message  control  node 
waits  a  small  amount  of  time  for  ciny  exception  to  be  raised.  Nodes 
wishing  to  transmit  an  exception  message  access  the  network  during 
this  interval  in  a  random  manner.  Clearly  the  hope  is  that  exceptions 
will  be  rare  and  that  no  more  than  one  will  be  raised  during  any  one 
exception  poU  period.  However,  the  possibility  exists  for  contention  for 
the  access  to  the  network  during  this  period.  Resolving  this  contention 
is  discussed  in  section  5.5.4.  The  control  node  may  poU  for  exceptions 
several  times  during  a  communication  cycle.  When  an  exception  is 
raised  it  may  very  likely  disrupt  the  normal  communication  cycles 
just  as  an  interrupt  disrupts  the  normal  flow  of  program  execution  in 
a  processor.  The  actions  taken  by  the  network  and  the  control  node 
depend  on  the  exception  raised. 

4.  After  taking  care  of  servo  messages  and  exceptions  the  network  is  free 
to  handle  other  traffic.   From  the  compiled  schedule  information  the 
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control  node  may  know  of  particular  messages  or  threads  that  should 
be  transmitted  at  this  time.  In  general,  the  control  node  can  simply 
poll  the  nodes  for  any  message  they  wish  to  send.  Polling  all  the  nodes 
may  not  be  possible  during  a  single  communication  cycle,  especially 
if  the  polls  produce  large  messages.  In  this  case  the  control  node  can 
continue  the  polling  in  the  following  cycle  using  some  "fair"  schedule. 

Another  responsibility  of  the  communication  node  is  to  watch  for  a  lost 
token.  The  ganglia  protocol  requires  that  a  node  receiving  the  token  trans- 
mit a  message  immediately.  Under  normal  circumstajices  the  node  h&s  one 
or  more  messages  ready  to  transmit  and  it  simply  picks  one  (using  the 
thread  as  a  guide)  and  transmits  it.  If  a  node  has  no  messages  to  transmit 
it  trajismits  a  null  message  passing  the  token  on  as  indicated  by  the  thread 
field.  The  node  can  indicate  that  it  was  unable  to  transmit  the  desired 
message  by  transmitting  an  OVERRUN  message.  If  nothing  else  is  appropriate 
the  token  is  passed  to  the  control  node.  It  is  possible  that  a  message  is  not 
successfully  received  by  the  node  receiving  the  token  in  this  case  the  token 
is  lost.  With  this  protocol  the  loss  of  the  token  can  be  detected  easily,  since 
if  the  control  node  does  not  have  the  token  and  the  network  is  idle,  then 
the  token  is  lost.  What  action  is  taken  by  the  control  node  when  it  detects 
a  lost  token  is  in  general  a  very  haird  problem  since  the  control  node  lacks 
information  as  to  how  the  token  was  lost  and  the  actual  state  of  the  system. 
A  typical  response  might  be  to  wait  until  the  scheduled  time  for  the  next 
communication  cycle  and  then  start  that  cycle. 

5.5.3     Threads. 

Any  but  the  simplest  system  will  have  multiple  servo  loops  running  at  dif- 
ferent rates  and  may  have  complex  traffic  patterns  for  the  steady  state  and 
changing  state  messages.  The  control  node  has  the  task  of  managing  this 
traffic. 

Consider  the  example  system  in  figure  5.2.  A  communication  cycle  might 
be  started  with  the  control  node,  C,  passing  the  token  to  A'  which  transmits 
its  message  and  passes  the  token  on  to  Y  which  does  the  same  and  passes  the 
token  to  Z  which  returns  the  token  to  the  control  node.  The  control  node 
polls  for  exceptions  and  then  polls  all  the  nodes  for  other  types  of  messages 
until  the  time  to  start  the  next  cycle.  This  sequence  of  passing  the  token 
from  C  to  A'  to  Y  to  Z  to  C  is  an  example  of  a  thread.  In  a  more  complex 
example  nodes  M  and  A^  may  transmit  messages  in  a  servo  loop  running 
one  fourth  as  often  as  the  above  loop.  Three  out  of  four  servo  cycles  would 
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then  be  just  a*  described  above  and  the  fourth  cycle  could  take  either  of 
two  forms,  a  single  thread  (C  to  A'  to  y  to  Z  to  M  to  TV  to  C)  or  two 
concatenated  threads  (C  to  A"  to  Y  to  Z  to  C  followed  by  C  to  A/  to  iV  to 
C).  The  sets  of  nodes  in  the  various  threads  do  not,  of  course,  have  to  be 
distinct.  It  is  very  likely  that  some  nodes  will  be  part  of  several  threads  and 
on  each  of  the  threads  the  node  is  to  send  a  different  set  of  data.  This  leads 
to  some  ambiguity  for  a  node  which  receives  the  token,  what  information  is 
the  node  to  transmit?  That  is,  what  communication  thread  is  in  progress? 
To  help  a  node  disambiguate  for,  an  identifier  for  the  thread  is  included  as 
part  of  the  token;  this  is  the  thread  field.  Thus  the  token  consists  of  an 
identifier  for  the  node  to  transnoit  and  a  thread  identifier  which  essentially 
indicates  what  data  is  to  be  trtinsmitted. 

Once  the  servo  part  of  a  communication  cycle  is  complete  the  control 
node  elicits  other  traiRc  (steady  state  or  changing  state)  by  polling  the  nodes 
on  the  network.  In  a  simple  system  the  nodes  may  be  polled  repeatedly  until 
the  time  for  the  start  of  the  next  cycle.  In  general,  this  part  of  the  cycle  is 
not  so  simple.  There  may  be  too  many  nodes  to  poll  every  communication 
cycle.  There  may  be  priority  constraints  to  be  considered.  There  may  be 
traffic  patterns  that  can  or  should  be  used  to  advantage.  For  instance, 
during  the  change  of  state  of  the  system  it  may  be  known  that  certain 
high  level  nodes  will  be  transmitting  large  amounts  of  data;  distributing 
new  pajameter  tables,  stopping  unneeded  processes,  starting  new  processes, 
etc.  In  this  case  the  control  node  should  poll  these  high  level  nodes  more 
frequently.  There  may,  in  fact,  be  specific  sequences  for  which  threads  could 
be  established  to  speed  the  transition. 

Where  do  threads  come  from?  Threjids  are  specified  by  the  system 
designer.  All  scheduling  could  be  done  by  simple  polling  but  by  turning 
common  sequences  into  threads  some  of  the  communication  bandwidth  is 
saved.  An  additional  benefit  is  that  the  thread  identifier  in  the  token  indi- 
cates to  the  receiving  node  what  sort  of  message  is  expected.  A  useful  tool 
would  be  a  GANGLIA  compiler  which  would  analyze  traffic  patterns,  set  up 
threads,  and  generate  scheduling  tables  for  the  control  node. 

5.5.4     Exceptions. 

The  above  discussion  describes  normal  operation  and  ignores  exceptional 
conditions.  Exceptions  are  unscheduled,  often  undesirable,  events.  The 
problem  they  present  to  the  access  scheduler  is  that  since  their  occurrence 
cannot  be  predicted  they  play  havoc  with  any  schedule  the  control  node  has 
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set  up.  One  solution  would  be  to  have  the  control  node  poll  the  individuaJ 
nodes  for  exceptional  conditions.  It  is  hoped  that  the  events  are  uncommon 
so  that  the  repeated  polling  seems  wasteful  and  to  exacerbate  the  situation, 
exceptions  should  be  treated  with  a  high  priority  which  means  the  polling 
would  have  to  be  done  often.  Instead  a  special  message,  the  exception  poll, 
is  available.  This  message  is  transmitted  by  the  control  node  when  it  is 
ready  to  receive  exceptions.  After  the  exception  poll  is  transmitted  the 
network  is  left  idle  for  a  short  interval  during  which  any  node  wishing  to 
raise  an  exception  begins  transmitting  the  exception  message.  This  places 
a  scheduling  constraint  on  the  control  node,  since  it  must  broadcast  an 
exception  poll  often  enough  to  satisfy  the  most  time  critical  exception  in 
system. 

As  mentioned  previously,  contention  for  access  to  the  network  can  occur 
during  this  interval.  To  resolve  contention  a  jamming  preamble  is  affixed  to 
exception  messages.  The  highest  priority  exception  has  the  longest  pream- 
ble so  that  it  will  still  be  jamming  when  other  exception  nodes  have  begun 
transmitting  the  message  body.  While  transmitting,  nodes  listen  for  colli- 
sions and  stop  transmitting  when  any  collision  is  detected.  Thus  if  multiple 
exceptions  are  raised  simultaneously  then  all  but  the  highest  priority  node 
will  detect  a  collision  cind  stop  transmitting  leaving  the  network  for  the 
highest  priority  message.  This  protocol  has  the  curious  side  effect  that  the 
highest  priority  exception  has  the  longest  preamble  and  thus  the  longest 
delay  to  transmit.  The  delay  caused  by  the  jamming  preamble  is  small  com- 
pared to  the  communication  cycle  and  delays  caused  by  the  demands  for 
other  traffic.  Nonetheless,  this  contention  resolution  scheme  is  not  pleasing 
amd  requires  more  attention. 

One  further  issue  needs  to  be  addressed,  what  is  to  be  done  when  the 
system  is  flooded  with  exceptions?  This  is  likely  when  truly  exceptional 
events  occur.  Consider  an  arm  that  unexpectedly  encounters  a  wall.  Force 
sensors  at  the  joints  and  end  effector  may  detect  excessive  forces  and  several 
nodes  may  try  to  raise  the  exception.  Meanwhile,  it  appears  to  position 
sensors  on  the  joints  that  the  actuators  are  failing  since  the  joints  do  not 
move  <is  commanded,  more  exceptions  to  be  raised.  A  flood  of  exceptions 
like  this  is  likely  to  interfere  with  an  effective  response  to  the  real  crisis. 
Ganglia's  solution  is  to  adapt  the  technique  that  CPU's  use  to  handle  a 
flood  of  interrupts  -  prioritization  of  the  interrupts.  Each  exception  message 
has  a  priority  in  the  range  1  to  7  with  7  being  the  highest  priority.  Included 
as  part  of  the  exception  poll  is  the  current  exception  level  of  the  network. 
A  node  responds  to  the  exception  poll  only  if  it  has  a  pending  exception 
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whose  priority  is  greater  than  the  networks  exception  level.  An  exception 
message  itself  includes  the  priority  of  that  exception  so  that  the  control 
node  can  raise  the  networks  level  appropriately.  Restoring  the  exception 
level  after  an  exception  has  been  properly  noted  or  handled  is  ultimately 
the  responsibility  of  the  control  node  but  the  node  raising  the  exception, 
other  affected  nodes  on  the  network,  and  higher  level  control  may  all  be 
involved  in  the  response  to  an  exception.  In  a  well  designed  system  the 
response  to  most  exceptions  will  be  planned  out  and  the  control  node  can 
restore  the  exception  level  when  the  plan  has  been  carried  out. 
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Field  Name 

Bytes 

Description 

next 

1 

Node  identifier  of  the  next  node  to  transmit. 
This  and  the  thread  make  up  the  token. 

thread 

1 

A  modifier  for  the  next  node  (refer  to  sec- 
tion 5.5.3). 

flags 

1 

Flags: 

EMPTY -MESSAGE 

EXCEPTION 

ACKNOWLEDGEMENT-REQUESTED 

MESSAGE-TYPE:  (5  bits) 

EXCEPT!  ON -LEVEL  0-7 
OVERRUN 
EXCEPTION_POLL 
IDLEJJETWORK 
CLEAR_TEST_ANDJSET 
TEST_AND_SET 
TEST_AND_SET_REPLY  (0) 
TEST-AND_SETJIEPLY  (1) 
POSITIVE  JICKNOWLEDGEMENT 
NEGATIVE -ACKNOWLEDGEMENT 

source 

1 

Node  identifier  of  the  node  transmitting  the 
message. 

name 

3 

Name  for  the  data  in  the  message,  this  name 
is  unique  across  the  network. 

length 

1 

The  number  of  bytes  of  data,  a  value  of  zero 
indicates  256  bytes  of  data,  if  the  data  field  is 
empty  the  EMPTY -MESSAGE  flag  is  set. 

time 

4 

A  time  stamp  on  the  data. 

data 

0-256 

The  data. 

Table  5.1:  Message  structure 
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The  CMMU 


^  HE  Communication  Memory  Management  Unit  is  the  interface  between 
a  processor  in  a  node  and  the  ganglia  network.  The  CMMU's  collectively 
maintain  the  globed  memory  and  individually  provide  access  to  the  network 
for  their  respective  processors.  A  CMMU  is  essentially  a  dual  ported  mem- 
ory management  unit.  It  has  three  major  functions: 

•  To  provide  access  to  the  local  portion  of  the  global  memory,  the  com- 
munication memory,  for  the  processor.  The  processor  reads  or  writes 
to  the  communication  memory  by  presenting  the  global  address  of  the 
desired  word  to  the  CMMU  which  translates  this  into  the  physical  lo- 
cation in  the  communication  memory  and  either  fetches  or  stores  the 
value  depending  on  the  operation. 

•  To  maintain  the  local  portion  of  the  global  memory  by  monitoring  the 
network  traffic  and  when  a  message  containing  a  data  item  which  is 
maintained  locally  is  received  the  data  is  stored  in  the  communication 
memory. 

•  To  provide  access  to  the  network  for  the  processor.  If  the  processor 
needs  to  transmit  a  message  it  constructs  the  appropriate  data  item 
in  the  communication  memory  and  instructs  the  CMMU  to  transmit 
the  message.  When  the  CMMU  receives  a  token  appropriate  for  this 
message  the  message  is  transmitted.  In  this  regard  the  CMMU  is  also 
responsible  for  supporting  the  integrity  of  the  network  by  participat- 
ing in  the  GANGLIA  protocol.  When  the  token  is  passed  to  a  node  the 
CMMU  is  responsible  for  responding  immediately  with  the  appropri- 
ate message  or  a  null  message  if  nothing  is  appropriate.  When  the 
processor  wishes  to  raise  an  exception  the  CMMU  is  responsible  for 
obeying  the  protocol  concerning  exceptions. 
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6.1     Memory  map  table. 

The  core  of  the  CMMU  is  the  memory  map  table  which  translates  global 
memory  references  to  communication  memory  locations  and  contains  other 
information  about  the  items  in  global  memory.  Table  6.1  shows  the  structure 
of  an  entry  in  the  map  table.  The  table  itself  is  maintained  primarily  by  the 
processor.  Programs  running  on  the  processor  know  what  data  items  are  of 
interest  and  which  they  will  want  to  transmit  and  they  must  prepare  the 
table  accordingly.  As  messages  are  received  and  transmitted  on  the  network 
the  CMMU  modifies  the  table  entries  accordingly. 

The  table  is  accessed  by  associative  search  for  the  most  part.  The  keys 
used  for  the  search  consist  of  the  name  and/or  thread  and  various  of  the 
flags.  The  entries  can  also  be  ax;cessed  randomly  by  the  processor.  There 
will  be  times  when  more  than  one  of  the  entries  will  match  a  search  key.  In 
this  case,  a  specific  entry  will  be  chosen  deterministically,  the  entry  lowest 
in  the  sequential  ordering  of  the  entries. 

The  fields  of  the  table  entries  dse  described  in  more  detail  below. 

Name  is  the  global  name  of  the  data  item.  This  is  assigned  when  the 
system  is  compiled  and  the  same  name  is  used  throughout  the  network.  It 
is  possible,  in  fact  likely,  that  there  will  be  more  than  one  entry  in  the  table 
for  a  given  item,  as  will  be  seen  below. 

Thread  is  thread  identifier.  When  a  node  receives  the  token  the  CMMU 
searches  the  map  table  for  an  item  that  is  ready  to  transmit  and  has  thread 
matching  the  thread  identifier  from  the  message  just  received.  One  of  the 
matching  items  is  then  transmitted. 

The  flags  contains  various  flags  associated  with  this  table  entry.  Ta- 
ble 6.2  describes  the  flags. 

Next/source  holds  a  node  identifier.  If  this  item  is  being  transmitted 
this  is  used  for  the  Next  field  of  the  message.  If  this  item  is  received  the 
source  field  from  the  message  is  stored  here. 

time  is  the  time  stamp  for  the  field.  When  a  message  is  created  the  time 
is  placed  here.  When  it  is  transmitted  the  time  field  is  included. 

Mthread  contains  the  thread  field  for  the  message.  As  with  the  time 
field  it  is  either  a  source  or  destination  depending  on  how  the  entry  is  used. 

Hf lags  contains  the  flags  field  for  the  message.  Again  it  is  either  a 
source  or  destination  depending  on  how  this  entry  is  used. 

Location  is  the  address  in  the  communication  memory  where  this  item 
is  stored.  When  a  message  is  received  for  the  data  item  associated  with  this 
entry  the  data  portion  of  the  message  is  stored  starting  at  this  location  in 
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the  communication  memory.  When  a  reference  from  the  processor  to  global 
memory  uses  this  entry  the  lowest  byte  of  the  virtual  address  is  added  to 
location  to  obtain  the  actual  address  in  the  communication.  Note  that  the 
page  size  in  the  global  memory  is  256  bytes  but  the  location  pointer  does 
not  have  to  be  on  a  page  boundary.  Depending  on  communication  memory 
design  considerations  it  may  have  to  be  aligned  on  a  word  boundary.  The 
reason  that  the  entries  in  the  map  table  do  not  correspond  to  memory  frames 
as  with  typical  memory  management  units  is  that  each  named  data  item 
(i.e.  the  data  transmitted  in  a  named  message)  is  one  page  in  the  global 
memory  but  most  of  the  items  are  much  less  that  than  256  bytes  and  the 
communication  memory  on  the  nodes  would  be  terribly  fragmented  if  each 
item  used  one  frame  of  the  memory. 

6.2     CMMU  operation. 

The  programs  on  the  processor  are  responsible  for  maintaining  the  memory 
map  table.  Typically,  a  program  (or  the  operating  system  on  the  node)  will 
set  up  the  map  table  entries  during  program  initialization  and  manipulation 
of  the  table  while  the  program  is  running  will  be  minimal,  although  dynamic 
changing  of  the  table  is  possible  and  even  desirable  in  some  cases.  The 
remaining  parts  of  this  section  describe  how  typical  functions  are  performed. 

6.2.1     Processor  references. 

For  the  processor  to  reference  a  global  data  item  in  the  communication  mem- 
ory four  conditions  must  be  met.  There  must  be  a  map  table  entry  with 
its  name  field  set  to  the  globaJ  name  of  the  item.  This  is  true  if  the  first 
three  bytes  of  the  global  address  match  the  name  field.  Second  the  location 
field  of  the  same  entry  must  point  to  the  appropriate  location  in  the  com- 
munication memory.  The  locally  mapped  flag  must  be  set.  And  the  busy 
flag  must  be  off,  i.e.  the  entry  must  not  be  in  use  by  the  CMMU.  K  these 
conditions  are  met  the  low  order  byte  of  the  global  address  is  added  to  the 
location  field  and  the  resulting  address  is  used  to  access  the  communica- 
tion memory.  The  name  field  and  the  location  field  must  be  explicitly  set 
by  the  processor,  the  locally  jnapped  flag  can  be  set  either  by  the  processor 
or  the  CMMU,  and  BUSY  is  set  by  the  CMMU  when  it  is  using  the  entry. 
If  the  conditions  are  not  satisfied  a  page  fault  is  generated  to  the  processor. 
The  usual  response  fo  the  processor,  if  the  entry  is  busy,  is  for  the  proces- 
sor is  to  postpone  the  reference  until  the  entry  is  no  longer  busy.   This  is 
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rather  costly  to  the  program  making  the  reference  and  if  it  is  on  a  critical 
time  path  when  the  fault  occurs  the  result  can  be  disastrous.  The  following 
paragraphs  discuss  ways  to  avoid  access  conflicts. 

6.2.2     Synchronization. 

There  are  basically  two  ways  to  avoid  simultaneous  access  to  a  data  item. 
The  first  is  to  synchronize  the  data  producing  process  and  the  data  con- 
suming process.  Many  of  the  activities  in  a  robot  control  system  axe  syn- 
chronized around  various  control  cycles.  So  often  producing  and  consuming 
processes  are  often  cdready  synchronized.  For  example,  a  system  design  can 
be  such  that  a  particular  data  item  is  always  produced  and  transmitted  near 
the  end  of  a  cycle  allowing  the  consuming  processes  to  use  the  item  during 
the  beginning  of  a  cycle  without  feax  of  simultaneous  access.  The  second 
way  to  synchronize  is  to  let  the  producer  process  "drive"  the  consumer  pro- 
cess. For  instance  the  consuming  node  can  set  the  INTERRUPTJ^HEN-DONE 
flag  which  will  cause  the  processor  to  be  interrupted  when  the  message  is 
received.  Assuming  it  does  not  need  the  data  for  "too  long"  it  cztn  use  the 
data  without  fear  of  interference  from  the  network.  If  the  two  processes 
cannot  be  easily  synchronized  it  is  possible  to  double-buffer  the  messages 
for  the  data  item.  Briefly,  this  is  accomplished  by  keeping  two  entries  for 
the  data  item,  one  is  mapped  into  the  local  address  space  and  is  available 
to  the  processor  the  other  is  marked  as  pending  and  is  ready  to  receive  an 
incoming  message.  The  local  program  is  responsible  for  manipulating  the 
table  entries  appropriately.  A  process  that  transmits  data  has  a  similar 
problem,  it  arises  because  there  is  some  time  between  when  a  data  item  is 
marked  as  waiting  for  transmission  (i.e.  its  PENDING  flag  is  set)  and  when 
the  tr£msniission  is  complete  and  the  area  of  communication  memory  can  be 
modified.  There  are  similar  solutions  also,  transmission  can  be  synchronized 
with  production  off"  the  message  either  by  design,  or  by  using  interrupts  or 
by  double-buflfering. 

The  purpose  of  the  DISABLE_WHEN_DONE  and  the  MAP_WHEN_DONE  flags  are 
to  reduce  the  communication  overhead  to  the  local  program.  If  the  pro- 
ducing and  consuming  processes  are  synchronized  in  such  a  fashion  that 
there  is  never  a  conflict  over  access  to  the  communication  memory  loca- 
tion for  the  a  data  item  then  a  single  entry  in  the  table  can  be  used  for 
the  item  and  the  PENDING  and  LOCALLY JIAPPED  flags  can  be  left  on  at  all 
times  (i.e.  the  DISABLEJ/HENJ)ONE  and  MAP_WHEN_DONE  flags  are  off").  In  this 
case  the  local  program  uses  the  communication  memory  area  according  to 
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its  schedule  and  the  CMMU  will  transmit  or  receive  as  the  network  indi- 
cates (i.e.  when  the  message  or  token  arrives)  and  no  special  communication 
overhead  is  required.  K  however,  there  is  a  danger  of  the  CMMU  and  the 
processor  interfering  with  each  other  then  these  flags  can  help  to  avoid  con- 
flict. Kthe  DISABLEJWHEN-DONE  flag  is  on  then  when  a  message  is  received  or 
transmitted  the  PENDING  flag  is  automatically  turned  off  upon  completion 
so  that  the  same  entry  (and  area  in  communication  memory)  can  not  be 
used  again  by  the  CMMU  until  explicitly  enabled  by  the  processor.  And 
the  MAP_WHEN_DONE  flag  controls  access  by  the  processor.  If  it  is  on  then 
the  LOCALLY  JIAPPED  flag  is  automatically  set  at  the  end  of  transmission  or 
reception  so  that  the  processor  will  be  able  to  access  the  data  as  soon  as  it 
is  available  but  will  fail  if  it  tries  to  access  it  prematurely. 

6.2.3     Transmitting  messages. 

To  transmit  a  data  item  the  processor  prepares  the  data  in  the  communi- 
cation memory  area  indicated  by  the  the  location  field  of  the  entry.  If  the 
entry  is  mapped  into  the  local  space  then  this  is  accomplished  simply  by 
referring  to  the  locations  by  their  global  names  as  in  the  previous  section. 
Various  fields  in  the  map  table  entry  must  also  be  set.  The  mthread,  next, 
and  length  fields  must  be  set  to  the  appropriate  values,  these  are  usually 
determined  when  the  system  is  compiled.  The  message  will  be  transmitted 
when  a  token  is  received  for  this  node  with  the  indicated  thread,  the  token 
wiU  be  passed  to  the  node  indicated  by  next.  The  time  field  should  be 
set  to  the  current  time  to  distinguish  this  transmission  of  the  data  from 
previous  and  following  transmissions.  The  flags  for  the  outgoing  message 
should  be  set  appropriately  in  mf lags.  Finally,  the  flags  in  the  table  entry 
must  be  set.  In  particular,  LOCALLY  JIAPPED  should  probably  be  turned  off. 
RECEIVE/TRANSMIT  should  be  set  to  transmit.  INTERRUPT_WHEN_DONE  and 
DISABLE  JJHENJ)ONE  should  be  turned  on  or  off,  as  desired.  And  the  PENDING 
flag  should  be  turned  on. 

When  the  token  is  passed  to  this  node  with  the  thread  field  of  the 
message  matching  the  thread  field  in  the  table  entry  the  data  item  will  be 
transmitted.  During  transmission,  the  BUSY  flag  wiU  be  turned  on.  At  the 
end  of  transmission  the  BUSY  flag  will  be  turned  off  and  the  local  processor 
will  be  interrupted  if  the  INTERRUPT_WHEN_DONE  flag  is  on  and  the  PENDING 
flag  will  be  turned  off  if  the  DISABLE_WHEN_DONE  flag  is  on. 

As  mentioned  above  it  is  sometimes  necessary  to  double-buffer  data 
items.    When  a  node  responsible  for  generating  a  data  item  must  double 
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buffer  it  sets  up  two  entries  in  the  map  table  for  the  same  global  data  item. 
The  name,  thxead,  mthread,  mf  lags,  next/source,  and  many  of  the  flags 
fields  would  be  the  same  in  the  two  entries.  The  location  pointers  would 
point  to  distinct  areas  of  the  communication  memory.  Initially,  one  of  the 
entries  would  be  locally  mapped  and  the  other  is  neither  mapped  nor  marked 
as  pending.  The  processor  prepares  the  first  message  for  transmission  in  us- 
ing the  mapped  entry,  when  ready  the  entry  is  prepared  for  transmission 
and  the  PENDING  flag  is  turned  on.  Now,  while  waiting  for  the  message  to 
be  transmitted  the  processor  caji  prepare  the  next  occurrence  of  this  mes- 
sage by  turning  on  the  LOCALLY  .MAPPED  flag  in  the  second  entry.  When  it  is 
ready  for  transmission  it  can  be  marked  as  pending  if  the  previous  message 
has  already  been  transmitted.  The  processor  can  then  map  the  first  entry 
again  and  prepare  the  next  message  using  this  entry. 

6.2.4     Receiving  messages. 

To  receive  a  data  item  the  node  allocates  a  table  entry  and  an  area  in  the 
communication  memory  for  the  data  item,  points  the  location  field  of  the 
entry  to  the  memory  area,  sets  the  name  and  length  fields  of  the  entry, 
and  turns  on  the  PENDING  flag  and  clears  the  RECEIVE/TRANSMIT  flag  (to 
indicate  receive).  When  a  message  is  received  the  CMMU  seairches  the  map 
table  for  an  entry  with  its  PENDING  flag  set  and  its  name  field  matching  the 
name  field  of  the  incoming  message.  If  an  entry  is  found  then  the  location 
pointer  of  the  entry  points  to  the  area  to  store  the  message  and  length 
indicates  the  amount  of  physical  memory  Jillotted  to  the  message.  If  the 
INTERRUPTJHHEN_DONE  flag  is  turned  on  then  the  processor  is  interrupted 
when  the  message  is  successfully  received.  The  DISABLEJfHEN_DONE  flag, 
if  set,  causes  the  CMMU  to  turn  off  the  PENDING  flag  after  a  message  is 
received.  This  prevents  another  incoming  message  from  using  the  table 
entry  until  the  processor  explicitly  enables  it  again.  If  the  MAP_WHENJ)ONE 
flag  is  on  the  CMMU  sets  the  LOCALLY J1APPED  flag  after  the  message  is 
received.  Thus  the  data  item  will  automatically  appear  in  the  processors 
address  space.  The  other  fields  in  the  table  entry  are  set  according  to  the 
incoming  message  so  the  processor  can  examine  the  mthread,  source,  time 
and  message  flags  (mflags)  if  desired. 

Double  buffering  for  received  data  items  is  similar  to  transmitted  items. 
Two  table  entries  are  set  up  with  the  same  name  and  length  fields  and 
distinct  location  fields.  One  of  the  entries  is  mzirked  as  pending  while  the 
other  is  mapped.   The  processor  uses  the  mapped  entry  while  the  CMMU 
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uses  the  pending  entry  to  receive  an  incoming  message.  When  a  message  is 
received  the  processor  can  switch  the  two  entries,  as  desired.  Again  more 
than  two  entries  can  be  used  as  long  as  only  one  is  pending  and  only  one  is 
mapped  at  any  time. 

6.2.5     Test  and  set 

The  most  common  primative  operation  for  inter-processor  synchronization 
is  the  test  and  set  operation.  Modern  micro-processors  include  test  and  set 
operations  in  some  form  [Moto85]  [Inte86].  Micro-processor  busses  facilitate 
test  and  set  operations  by  allowing  a  bus  master  to  lock  the  bus,  preventing 
access  by  other  bus  masters,  for  a  series  of  operations  [Micr85]  [Inte84].  The 
important  condition  for  a  test  and  set  operation  is  that  the  operation  be 
atomic  thus  preventing  race  conditions  among  the  competing  processors. 

It  is  important  that  GANGLIA  provide  a  test  and  set  facility.  To  ensure 
the  integrity  of  a  test  and  set  operation,  it  is  clear  that  for  any  pjirticulaj 
synchronization  variable,  a  single  node  must  be  responsible  for  maintaining 
it.  Which  node  is  not  so  important  but  a  single  node  must  maintain  the  true 
state  of  the  variable.  Other  nodes  must  "request"  a  test  and  set  operation  on 
the  variable  by  the  responsible  node  and  the  responsible  node  must  perform 
the  operation  as  an  atomic  operation  and  report  the  results. 

A  simple  approach  (from  the  standpoint  of  GANGLIA  design)  would  be 
to  have  the  processor  in  a  node  hzindle  the  request.  A  node  requiring  ac- 
cess to  a  resource  would  send  a  message  to  the  node  responsible  for  the 
synchronization  variable  for  that  resource  requesting  access  (and  simulta- 
neously excluding  access  by  other  nodes).  The  processor  in  the  responsible 
node  would  be  interrupted  when  the  message  arrived  and  it  would  perform 
the  necessary  operations  including  setting  up  the  reply  message  for  trans- 
mission. The  reply  message  would  be  transmitted  at  some  point  when  the 
token  was  passed  to  the  responsible  node  and  the  requesting  node  would 
then  get  its  reply.  This  involves  a  great  deal  of  overhead,  the  responsible 
node  is  interrupted,  it  must  interpret  an  incoming  message,  perform  the  op- 
eration zmd  prepare  an  outgoing  message.  Meanwhile  the  requesting  node 
must  wait  for  an  undetermined  duration  for  the  reply. 

A  preferable  approach  for  implementing  the  test  and  set  is  to  have  the 
CMMU  in  the  responsible  node  handle  the  operation.  In  this  case  the  re- 
questing node  would  again  send  a  test  and  set  request  to  the  responsible 
node  and  the  token  would  be  passed  to  the  responsible  node  as  part  of  the 
request.    The  CMMU  receiving  the  request  (along  with  the  token)  would 
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perform  the  test  and  set  operation  and  report  the  results  immediately.  This 
is  more  appealing  than  the  previous  approach  since  the  processor  in  the 
responsible  node  is  not  involved  in  the  operation  and  the  reply  is  returned 
as  soon  as  possible  (i.e.  the  next  message).  One  complication  is  that  the 
CMMU  must  ensure  that  its  local  processor  does  not  access  the  synchro- 
nization variable  in  the  midst  of  the  test  and  set  operation.  But  since  the 
CMMU  controls  access  to  the  communication  memory  by  the  local  processor 
it  can  ensure  the  integrity  of  the  operation. 

When  the  CMMU  receives  a  TEST_AND_SET  message  it  checks  the  map 
table  for  the  entry  indicated  by  the  name  field  of  the  message.  IS  none  is 
found  a  NEGATIVE_ACKNOWLEDGEMENT  message  is  sent  and  the  token  is  passed 
to  the  originating  node.  If  an  entry  is  found  the  CMMU  checks  the  status 
of  the  TEST-AND^ET  flag  in  the  entry  and  sends  the  appropriate  response 
and  passes  the  token  to  the  originating  node.  The  TEST_AND^ET  flag  is 
also  turned  on.  The  TEST_AND_SET  flag  is  cleared  by  a  CLEAR.TEST_AND_SET 
message. 

Placing  responsibility  for  the  test  and  set  on  the  CMMU  complicates 
its  design.  Up  to  this  point  the  network  functions  of  the  CMMU,  receiving 
and  storing  incoming  data  and  watching  for  the  token  and  transmitting 
messages,  were  quite  independent.  One  could  imagine  their  implementation 
as  independent  state  machines  within  the  CMMU.  However,  the  introduction 
of  the  test  and  set  operation  requires  that  the  receiving  aind  transmitting 
portions  of  the  CMMU  must  be  intertwined. 

A  final  note  shotild  be  made  here.  Implementing  an  efficient  test  and  set 
operation  is  important  and  will  increase  the  flexibility  of  ganglia  systems. 
But  a  major  tenet  of  the  philosophy  behind  ganglia  is  that  robot  control 
systems  are  highly  synchronized  and  that  this  synchronization  can  be  used 
to  advantage  in  the  overall  system  design.  In  particular,  many  conflicts  for 
access  to  shared  resources  can  be  avoided  by  the  inherent  synchrony  of  the 
system  and  that  mechanisms  such  as  the  test  and  set  operation  will  not  be 
necessary  in  these  cases.  This  is  not  to  say  that  a  GANGLIA  system  does 
not  need  synchronization  primatives,  any  non-trivial  system  will  need  them, 
but  that  the  most  frequent  and  time-critical  instances  of  potential  resource 
conflict  can  be  avoided  by  taking  advantage  of  the  synchrony  of  the  overall 
system. 
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6.2.6     Message  acknowledgements. 

Three  inter-related  assumptions  have  been  made  to  support  the  claim  that 
GANGLIA  is  a  feasible  architecture  for  robot  control  systems.  First,  a  com- 
munication medium  can  be  found  that  will  operate  with  suitable  reliability 
in  the  environment  of  a  robot.  It  need  not  be  totally  reliable  but  it  must  be 
sufficient  to  support  the  system.  Second,  much  (if  not  the  majority)  of  the 
communication  traffic  is  such  that  an  occasional  lost  packet  can  be  toler- 
ated. This  is  reasonable,  since  much  of  the  data  would  be  smoothly  varying 
and  would  be  continuously  reported.  Third,  distributed  programs  can  be 
developed  to  effectively  operate  in  this  environment.  It  is  unreasonable  to 
assume  that  these  three  assumptions  hold  at  all  times.  Some  data  must 
be  successfully  transmitted,  the  code  for  down-loaded  program  for  example. 
Sometimes  a  node  must  know  "beyond  a  rezisonable  doubt"  that  another 
node  has  received  and  accepted  some  data.  Ganglia  can  support  normal 
data-gram  messages  so  handshaking  protocols  available  on  regular  networks 
are  available  on  GANGLIA.  The  CMMU  further  assists  reliable  communica- 
tion in  the  following  ways. 

The  CMMU  keeps  a  copy  of  the  last  successfully  received  or  transmitted 
message  header  regardless  of  whether  or  not  the  data  in  the  message  was  of 
interest  lo  the  node.  In  addition,  the  CMMU  can  interrupt  the  processor  if 
the  network  is  ever  idle.  The  ganglia  protocol  specifies  that  the  network  is 
never  idle  except  at  times  specified  by  the  control  node.  At  other  times  the 
network  will  only  go  idle  if  the  node  indicated  in  the  token  fails  to  receive 
the  token.  Thus  if  node  X,  in  figure  5.2,  wishes  some  assurance  that  a 
message  sent  to  M  is  received  it  can  pass  the  token  to  M  along  with  the 
message  and  then  monitor  the  network  using  these  features  of  the  CMMU. 
If  the  network  goes  idle  and  the  last  header  is  the  message  to  M  then  it  is 
likely  that  the  message  was  not  received  by  M.  K  it  does  not  go  idle  or  it 
goes  idle  after  the  message  is  transmitted  but  other  traffic  was  observed  then 
the  message  was  received  by  M.  This  scheme  can  be  extended  to  sequences 
of  messages.  A  node  can  request  to  be  informed  if  the  network  goes  idle 
while  the  transmission  of  the  sequence  of  messages  is  in  progress.  If  the 
network  becomes  idle  then  the  last  observed  header  may  indicate  where  the 
sequence  failed.  If  the  network  never  goes  idle  during  the  sequence  then 
all  the  messages  were  successfully  transmitted.  Note  that  the  fact  that  the 
node  indicated  by  the  token  successfully  transmits  indicates  that  it  received 
the  message  but  does  not  indicate  that  any  other  node  necessarily  received 
the  token.   Thus  the  above  scenarios  give  no  assurance  that  the  messages 
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were  received  by  aJl  interested  parties.  Also  note  that  a  node  responding 
to  the  token  indicates  only  that  the  message  was  not  damaged  during  the 
transmission.  It  does  indicate  that  the  receiving  node  agrees  in  any  sense 
with  the  message  or  understands  it. 

In  a  manner  similar  to  the  test  and  set  operation  described  above  a 
node  can  request  that  the  CMMU  of  the  noded  receiving  the  token  respond 
immediately  with  an  acknowledgement.  This  is  a  more  familiar  and  direct 
form  of  acknowledgement  which  can  be  used  if  desired.  When  a  node  re- 
ceives a  message  with  its  ACKNOWLEDGEMENT JIEQUESTED  flag  set  the  CMMU 
replys  to  the  originating  node  with  an  acknowledgement.  If  there  is  an  entry 
matching  the  data  item  in  the  name  field  of  the  message  then  the  CMMU 
replies  with  a  POSITIVE-ACKNOWLEDGEMENT  message  otherwise  it  replies  with 
a  NEGATIVE-ACKNOWLEDGEMENT  message. 
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Field  Name 

Bytes 

Description 

name 

3 

The  globaJ  name  of  the  data  item. 

thread 

1 

The  thread  identifier.  See  section  5.5.3. 

flags 

2 

Flags: 

LOCALLY_MAPPED 

EXCEPTIONJffiSSAGE 

EMPTY  JIESSAGE 

RECEIVE/TRANSMIT 

PENDING 

BUSY 

DONE 

INTERRUPT  JMHEN_DONE 

DISABLE_WHEN_DONE 

MAP.WHEN_DONE 

TEST_AND^ET 

next/source 

A  node  identifier  used  either  as  the  next  field 
when  transmitting  or  to  hold  the  source  field 
when  the  item  is  received. 

length 

The  length  of  a  message.     Zero  indicates  256 
bytes.    The  empty_message  flag  is  used  to  in- 
dicate a  message  of  zero  length. 

time 

A  time  stamp  from  the  last  time  the  item  was 
received. 

mthread 

The  thread  field  from  the  last  message  received 
for  this  entry  or  to  be  transmitted  with  this  mes- 
sage. 

mflags 

The  flags  field  from  the  last  time  the  item  was 
received  or  to  be  transmitted  with  the  message. 

location 

4 

The  location  of  this  item  in  the  communication 
memory. 

Table  6.1:  Memory  map  table  entry 
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Flag 

Description 

LOCALLYJIAPPED 

When  set  this  flag  indicates  that  the  entry  can 
be  used  in  memory  references  by  the  processor. 

EXCEPTIONJIESSAGE 

When  set  this  flag  indicates  this  is  for  a  pend- 
ing exception  message.  This  message  may  be 
transmitted  when  the  CMMU  receives  an  ex- 
ception poll. 

EMPTY  .MESSAGE 

When  set  this  flag  indicates  that  the  message 
is  empty. 

RECEIVE/TRANSMIT 

This  flag  indicates  whether  this  entry  is  for 
receiving  or  transmitting  the  data  item. 

PENDING 

When  set  this  flag  indicates  to  the  CMMU 
that  the  reception  or  transmission  of  this  item 
is  pending. 

BUSY 

When  set  this  flag  indicates  that  the  CMMU 
is  using  the  table  entry.  Usually  this  means 
that  transmission  or  reception  is  in  process. 

DONE 

This  flag  is  set  when  a  transfer  is  complete.  It 
is  cleared  when  the  PENDING  flag  is  set. 

INTERRUPT  J*HENJ)ONE 

When  set  this  flag  indicates  that  the  CMMU 
should  interrupt  the  processor  when  the  DONE 
flag  is  set. 

DISABLEJHHENJ)ONE 

When  set  this  flag  indicates  to  the  CMMU 
that  the  pending  flag  should  be  cleared  when 
a  transfer  is  complete.  Thus  further  trans- 
fers via  this  table  entry  are  prevented  until  the 
PENDING  flag  is  explicitly  set  by  the  processor. 

MAPJ*HENJ)ONE 

When  set  this  flag  indicates  to  the  CMMU 
that  the  LOCALLYJIAPPED  flag  should  be  set 
when  a  transfer  is  complete. 

Table  6.2:  Memory  map  entry  flags 
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Analysis 


J.  HIS  CHAPTER  examines  the  performance  of  ganglia.  We  look  at  two 
current  robot  systems,  the  Utah/MIT  hand  with  Unimate  PUMA  arm  used 
in  NYU's  robotics  laboratory  [Clar89a]  and  the  ping-pong  playing  robot 
developed  by  Russell  Andersson  at  Bell  Laboratories  [Ande88].  The  data 
required  by  the  systems,  particularly  the  lower  level  servos,  are  analyzed 
to  get  estimates  of  the  load  that  would  be  placed  on  the  network  if  the 
systems  were  implemented  using  GANGLIA.  These  estimates  must  be  viewed 
with  some  skepticism  since  changing  something  as  fundamental  to  a  system 
a£  the  communication  architecture  is  likely  to  alter  the  flow  of  data  in  the 
system.  The  figures  used  below  tend  to  be  the  raw  data  figures,  that  is  the 
data  generated  or  used  by  the  raw  devices,  this  is  especially  true  for  the 
Utah/MIT  hand  figures.  A  primary  rational  for  ganglia  is  to  facilitate 
the  placement  of  processors  near  the  devices  they  control.  This  will  tend 
to  reduce  the  amount  of  data  that  must  be  transmitted  over  the  network. 
Consider,  for  instance,  the  tactile  sensors  for  the  Utah/MIT  hand.  The 
raw  bandwidth  for  the  sensors  is  over  lOOK  words  per  second  (256  values 
per  finger  50  times  a  second)  but  a  processor  attached  to  the  sensors  could 
distill  the  data  to  a  few  values  and  reduce  the  bandwidth  to  about  UK 
words  per  second.  We  believe  that  the  figures  are  plausible  and  are  not 
gross  underestimates. 

7.1     Model 

The  operation  of  GANGLIA  has  been  presented  in  the  chapter  on  ganglia's 
network,  protocol,  and  the  CMMU  (chapters  5  and  6).  What  is  needed  is 
to  construct  an  analytic  model  of  GANGLIA  -  a  model  of  the  communication 
traffic  to  be  expected  in  a  ganglia  system.  Such  a  model  is  developed  in 
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this  section. 

In  the  section  on  message  tajconomy  (section  5.5.1)  the  traffic  in  a  GAN- 
GLIA system  is  divided  into  four  categories,  servo  messages,  steady  state 
messages,  changing  state  messages,  and  exception  messages.  Servo  mes- 
sages are  the  messages  involved  in  the  low-level  servoing.  They  are  periodic 
messages  with  hard  real-time  deadlines  and  high  frequency.  Generally,  they 
are  the  highest  priority  messages  in  the  network.  The  steady  state  messages 
are  involved  in  somewhat  higher  level  functions  of  maintaining  the  system's 
state.  These  messages  will  often  be  periodic  but  with  lower  frequency  thaui 
the  servo  messages.  Changing  state  messages  are  involved  with  the  less  fre- 
quent generally  aperiodic  changes  in  system  state.  The  messages  will  also 
be  aperiodic.  Finally,  the  exception  messages  are  involved  with  the  very 
infrequent  and  aperiodic  notification  of  exceptional  condition. 

Section  5.5.2  describes  the  structure  of  the  communication  cycle.  The 
cycle  has  three  major  components,  the  servo  cycle  or  servo  frame,  the  ex- 
ception poll,  and  the  asynchronous  cycle  or  asynchronous  frame.  Thus  the 
trziffic  is  partitioned  into  three  categories. 

•  Periodic  time -constrained  messages  include  the  servo  messages  plus 
some  of  the  steady  state  messages.  These  are  transmitted  during  the 
servo  cycle. 

•  Exception  messages  are  transmitted  in  response  to  the  exception  poll. 

•  Asynchronous  messages  include  all  other  messages.  These  are  trans- 
mitted during  the  asynchronous  cycle.  We  will  see  later  that  this  cat- 
egory can  be  divided  into  two  further  categories,  those  asynchronous 
messages  that  are  time-constrsiined  and  those  that  are  not  time  con- 
strained. But  for  now  these  will  be  treated  as  a  single  category. 

We  will  divide  the  communication  cycle  into  two  parts  for  purposes  of 
analysis.  Traffic  in  the  synchronous  frame,  which  includes  the  servo  cycle 
and  the  exception  poll  is  scheduled  deterministically  by  the  control  node. 
The  asynchronous  frame,  which  includes  the  asynchronous  traffic,  will  be 
analyzed  using  a  polling  model.  Note  that  the  exception  poll  is  included  with 
the  synchronous  traffic.  The  poll  itself  is  synchronous  and  is  performed  every 
cycle.  The  exception  messages  are  asynchronous  but  are  very  infrequent 
and  will  be  ignored  in  the  analysis.  The  synchronous  traffic  is  scheduled 
off-line  and  the  controller  node  polls  during  each  cycle  according  to  a  static 
table.  Thus  the  question  of  whether  messages  can  be  successfully  scheduled 
is  made  before  run-time.  The  question  of  interest  to  us  is  the  total  bandwidth 
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Figure  7.1:  Communication  frame. 


consumed  by  the  synchronous  messages.  We  will  use  two  existing  robot 
systems  as  models  to  obtain  realistic  figures  for  the  synchronous  traffic.  The 
asynchronous  traffic  is  more  difficult  to  analyze.  As  a  first  approximation, 
we  will  use  the  normal  assumptions  about  packet  switched  traffic.  Namely 
that  the  number  of  stations  is  known,  the  average  message  length  is  known, 
the  average  inter-arrival  time  is  known,  and  that  the  traffic  is  uniformly 
distributed  among  the  nodes.  We  will  also  assume  that  a  simple  round-robin 
polling  scheme  is  used  to  provide  access  to  the  network.  That  is,  during  the 
asynchronous  frame  the  control  node  simply  polls  one  node  after  another 
with  the  polling  sequence  carried  over  from  one  cycle  to  the  next.  Under 
these  assumptions  we  will  be  able  to  calculate  the  average  message  delay 
time  and  the  network  throughput.  In  a  sense,  this  simple  analysis  provides 
desirable  bounds  (lower  bound  for  throughput  and  upper  bound  for  delay), 
since  any  addition  information  that  is  known  about  the  traffic  (eg.  what 
nodes  generate  more  traffic  or  commonly  occurring  message  sequences)  can 
be  used  by  the  control  node  to  improve  the  scheduling.  Following  the  simple 
analysis  there  is  some  discussion  of  the  possibilities  along  this  line. 

Figure  7.1  shows  the  communication  frame.  Each  cycle  begins  with  the 
synchronous  traffic.  The  control  node  invokes  the  various  message  threads 
and  polls  nodes  according  to  tables  produced  by  the  scheduler  off-line.  The 
synchronous  frame  ends  with  a  poll  for  exception  messages.  The  control 
node  then  waits  for  small  interval  to  give  any  node  with  an  outstanding 
exception  message  to  transmit.  After  waiting,  the  control  mode  polls  the 
nodes  for  asynchronous  messages  beginning  where  it  left  off  the  previous 
cycle.  The  control  node  can  poll  for  the  remainder  of  the  frzune.  However, 
to  ensure  that  the  next  frame  will  begin  on  time  the  controller  must  not  poll 
a  node  unless  a  message  from  that  node  will  be  completed  before  the  end  of 
the  frame.  There  is  some  bandwidth  lost  due  to  this  fragmentation. 
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Figure  7.2:  Communication  packet  format. 


The  synchronous  and  asynchronous  traffics  are  orthogonal  to  each  other 
as  far  as  scheduling  is  concerned,  except  that  the  synchronous  traific  takes 
bandwidth  from  the  asynchronous  traffic.  So  we  will  separate  the  two  in  our 
analysis.  For  the  synchronous  traffic  we  will  use  existing  robot  systems  to 
develop  estimates  of  the  bajid width  needed.  We  will  treat  the  analysis  of  the 
asynchronous  traffic  as  if  it  were  on  a  network  with  a  bandwidth  reduced 
by  the  amount  consumed  by  the  synchronous  traffic.  For  example,  on  a 
10  Mbit/second  network,  if  we  find  that  4  Mbits/second  is  required  for  the 
synchronous  traffic  we  will  analyze  the  asynchronous  as  if  it  were  alone  on 
a  6  Mbit/second  network. 


7.1.1     Packet  format. 

Figure  7.2  shows  the  GANGLIA  packet  format.  The  message  begins  with  an 
8  byte  prezimble  which  conditions  the  line.  The  message  header  follows  and 
is  12  bytes  long.  The  header  is  described  in  detail  in  section  5.5.1.  Next 
is  a  2  byte  redundancy  check  on  the  header.  This  is  followed  by  the  data 
which  can  be  up  to  256  bytes  in  length,  followed  by  ajiother  redundancy 
check.  Following  the  message  is  a  postamble.  Nothing  is  transmitted  during 
the  postamble,  it  represents  the  interval  within  which  the  node  receiving  the 
token  must  begin  transmitting.  The  receiving  node  does  not  have  to  wait 
for  the  interval  to  complete,  it  may  begin  transmitting  as  soon  as  it  is  ready 
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but  8  bytes  is  the  longest  it  may  delay.  The  postamble  is  shown  as  part  of 
the  message  because  in  the  analysis  we  must  account  for  this  time  and  it  is 
easiest  to  consider  it  as  part  of  the  message. 

7.2     Synchronous  Traffic 

The  first  model  system  for  synchronous  traffic  is  based  on  the  Utah/MIT 
hand  attached  to  a  Unimate  PUMA  arm  used  in  the  NYU  Robotics  labora- 
tory. In  addition  there  is  a  VPL  DataGlove  which  acts  as  a  control  master 
to  the  arm  and  hand.  The  setup  is  described  in  two  articles,  Teleoperating 
the  Utah/MIT  Hand  with  the  VPL  DataGlove.  I.  DataGlove  Calibration 
[Hong88]  and  Teleoperating  the  Utah/MIT  Hand  with  the  VPL  DataGlove. 
IL  Architecture  and  Applications  [Clar89a].  Some  additional  figures  were 
provided  by  Tom  Specter  of  Bell  Laboratories  [Spee88b]  [Spee88a]. 

The  reader  should  note  that  throughout  this  section  IK  means  1,024 
(not  1,000).  Also  note  that  for  the  examples  the  network  is  assumed  to  have 
a  total  bandwidth  of  10,485,760  bits/second  (1,024  X  1,024  X  10). 

Tables  7.1,  7.2,  auid  7.3  show  how  the  traffic  demands  of  the  various 
devices  were  calculated.  Each  row  of  the  tables  describes  a  component  of 
the  system.  The  first  column  is  the  name  of  the  component.  The  second 
calculates  the  data  produced  or  consumed  by  the  component  each  cycle. 
The  cycle  here  is  the  component's  cycle  and  should  not  be  confused  with  the 
communication  cycle  discussed  above.  Consider  the  DataGlove  in  table  7.2, 
it  produces  eight  two  byte  numbers  every  cycle  and  the  cycle  frequency  can 
be  set  to  either  30  or  60  cycles  per  second.  The  third  column  is  the  packet 
size  for  the  component,  including  the  packet  header.  Packets  are  limited 
to  288  bytes  (256  bytes  of  data  and  32  bytes  of  header).  So  in  the  case  of 
the  raw  tactile  input  in  the  first  table,  the  data  would  have  to  be  divided 
into  8  packets  each  cycle.  The  next  column  indicates  the  number  of  packets 
that  must  be  trajismitted  each  cycle.  The  fifth  column  is  the  frequency 
of  cycles  for  the  component.  For  some  of  the  components,  the  frequency 
could  vary  over  a  wide  range  and  be  set  as  required  by  performance  and 
system  considerations  (for  instance  the  joint  positions),  while  for  others 
(the  PUMA  I/O  with  VAL  II  for  example),  it  is  set  by  the  manufacturer 
and  is  not  changeable.  The  last  column  is  the  total  bandwidth  required  by 
the  component  in  kilo-bytes.  It  is  the  product  of  columns  3,  4,  and  5. 

The  hand  has  four  fingers  each  with  four  joints.  Each  joint  is  driven 
by  two  independently  controlled  tendons.  The  first  row  of  table  7.1  is  the 
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Description 

Data  bytes 

Packet 

Packets/ 

Freq. 

Kbytes/ 

(per  cycle) 

size 

cycle 

(hz) 

second 

Joint 

(A  joints/ finger  x 

96 

1 

250 

23.44 

Position 

2  tendons/ joint  x 

to 

to 

Input 

2  bytes/tendon)  x 
4  fingers  —  64  bytes 

1,000 

93.75 

Joint 

[A  joints/  finger  x 

96 

1 

250 

23.44 

Position 

2  tendons/ joint  x 

to 

to 

Output 

2  bytes /tendon)  x 
4  fingers  =  64  bytes 

1,000 

93.75 

Tendon 

(4  joints/ finger  x 

96 

1 

250 

23.44 

Tension 

2  tendons/ joint  x 

to 

to 

Input 

2  bytes/tendon)  x 
4  fingers  =  64  6y<e» 

1,000 

93.75 

Raw 

{512  points/ finger  x 

288 

8 

50 

112.5 

Tactile 

1  bytes/pomt)  x 

to 

to 

Input 

4  fingers  = 
2,048fcs'<ea 

100 

225 

Distilled 

(6  momenia/«enaor  x 

224 

1 

50 

10.94 

Tactile 

2  sensors/ finger  x 

to 

to 

Input 

4  6y<es/momen/)  x 
4  fingers  =  192  6v<£« 

100 

21.88 

Table  7.1:  Utah/MIT  hand  servo  traffic. 


position  sensing  data  for  the  joints  zind  the  second  row  is  the  joint  actuator 
commands.  The  third  row  is  the  data  from  load  sensors  on  the  tendons. 
We  employ  a  servo  that  combines  the  position  and  tension  tendon  inputs  to 
control  the  joint  positions.  This  loop  currently  runs  at  250  hz  but  it  might 
be  desirable  to  run  the  loop  faster  for  fine  manipulations. 

A  tactile  sensor  has  been  developed  for  the  hand  [Spee88b].  This  consists 
of  a  16  by  16  grid  of  force  sensors.  The  raw  tactile  input  in  the  fourth  row 
assumes  2  sets  of  sensors  per  finger  and  obtaining  8  bits  of  force  data  per 
point.  A  sampling  frequency  of  50  hz  is  used  by  Speeter  [SpeeSSa]  with 
100  hz  being  a  possibly  more  desirable  frequency.  The  230,400  bytes/second 
generated  by  these  sensors  at  100  hz  is  by  far  the  largest  bandwidth  needed 
by  any  of  the  devices  presented  in  this  section.  One  of  the  motivations  for 
GANGLIA  is  to  allow  processing  power  to  be  placed  near  to  the  sensors.  This 
is  an  instance  of  where  this  would  be  desirable.  In  most  situations  the  raw 
data  from  the  sensor  is  more  than  needed  or  effectively  used.  What  might 
be  more  useful  would  be  just  the  first  two  moments  along  the  i,  y,  and  z 
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Description 

Data  bytes 
(per  cycle) 

Packet 
size 

Packets/ 
cycle 

Freq. 
(hz) 

Kbytes/ 
second 

Lord 
Wrist 

&  gauges  x 
2bytesl  gauge    = 
16  bytes 

48 

40 
to 
80 

1.88 

to 

3.75 

PUMA 
Input 

(w/VAL  II) 

6DOF  X 
2bytes/DOF  = 
12  bytes 

44 

40 

1.72 

PUMA 
Output 

(w/VAL  II) 

6DOF  X 
2bytes/DOF   = 
12  bytes 

44 

40 

1.72 

PUMA 

Input 

(w/o  VAL  II) 

6DOF  X 
2bytes/DOF   = 
12bytes 

44 

150 
to 
500 

6.45 

to 

21.48 

PUMA 
Output 
(w/o  VAL  II) 

eDOF  X 
2bytes/DOF  = 
12  bytes 

44 

150 
to 
500 

6.45 

to 

21.48 

VPL 
Data  Glove 

UDOF  X 
2bytes/DOF  = 
32bytes 

64 

30 
to 
60 

1.88 

to 

3.75 

Table  7.2:  PUMA  and  DataGlove  servo  traffic. 


axes  (where  z  is  the  force  measured  at  each  point).  A  processor  placed  near 
the  sensors  could  produce  these  moments  and  transmit  only  this  distilled 
data  on  the  network.  The  fifth  row  of  the  table  calculates  the  bandwidth 
needed  for  this  distilled  data. 

The  next  table  (7.2)  shows  the  junount  of  data  produced  and  consumed 
by  the  PUMA  arm  and  the  VPL  DataGlove.  The  first  line  describes  the 
requirements  of  the  force  ajid  torque  sensing  wrist  made  by  the  Lord  Cor- 
poration [Lord83].  The  wrist  connects  the  hcind  to  the  PUMA  arm  and 
provides  force  and  torque  feedback  to  the  arm  controller.  The  next  four  row 
describe  the  data  for  the  arm  controller  itself.  Rows  2  and  3  present  the 
data  requirements  needed  by  the  PUMA's  operating  system,  VAL  II,  in  the 
"alter"  mode  [Uni].  This  is  a  special  mode  under  which  set  points  must  be 
provided  once  every  28  msec.  Rows  4  and  5  present  estimates  of  the  traffic 
that  would  be  required  by  an  alternative  controller  that  controls  the  arm 
directly,  bypassing  VAL  II.  The  VPL  DataGlove  [VPL87]  [Hong88]  is  a  glove 
worn  by  a  person  that  senses  and  reports  on  the  position  of  the  hajid  in  the 
glove  and  the  positions  of  the  fingers.   This  is  used  as  a  master  device  to 
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Description 

Data  bytes 
(per  cycle) 

Packet 
size 

Packets/ 
cycle 

Freq. 

(hz) 

Kbytes/ 
second 

Finger 
Servo 

(2  targets/joint  x 
4  joints/ finger  x 
2  bytes /tar get)  x 
4  fingers  =  64  bytes 

96 

1 

30 
to 
60 

2.81 

to 

5.62 

Object 
Servo 

(3  DOF/ finger  x 
2  targets/ DOF  x 
4  bytes/target)  x 
4  fingers  =  96  6j/<ej 

128 

1 

10 
to 
30 

1.25 

to 

3.75 

Table  7.3:  Finger  and  Object  servo  traffic. 


control  the  robot  arm  and  hand.  Sixteen  values  are  reported  either  30  or  60 
times  per  second. 

Finally,  table  7.3  is  an  attempt  to  set  out  the  communication  require- 
ments of  higher  level  servos.  The  finger  and  object  level  servos  are  as  de- 
scribed in  chapter  3.  The  figures  in  the  table  represent  the  size  of  the 
periodic  data  buffer  structures  used  in  the  hand  controller  implementation 
with  Hic  in  our  laboratory. 

The  reader  should  remember  that  the  purpose  here  is  to  develop  rea- 
sonable values  for  the  bandwidth  required  by  the  synchronous  traffic  in  a 
GANGLIA  baised  system.  The  data  in  the  tables  does  not  represent  a  single 
existing  system  but  estimates  based  on  several  existing  systems.  The  fig- 
ures in  the  tables  include  the  message  header  as  described  above  and  the 
header  includes  in  its  32  bytes  the  preamble  for  conditioning  the  line  and 
the  postamble  which  allows  for  the  response  time  of  the  node  receiving  the 
token.  So  there  is  no  additional  transmission  overhead  for  these  messages. 
There  is,  however,  polling  overhead.  In  the  worst  case  there  will  be  one 
polling  message  for  each  of  the  servo  messages,  which  adds  32  bytes  (the 
size  of  a  polling  message)  to  the  bandwidth  required  for  each  message.  In 
a  well  designed  system  the  number  of  polling  messages  will  be  considerably 
less  since  threads  will  be  used  (see  section  5.5.3). 

First  consider,  what  we  shall  call,  the  maximal  system.  This  includes  all 
the  components  in  the  tables  at  their  highest  frequencies.  It  also  assumes 
that  the  raw  tactile  sensor  data  is  transmitted  over  the  network  and  that 
the  PUMA  control  bypasses  VAL  II  (which  requires  much  more  bandwidth). 
The  hand  component  includes  the  top  three  rows  from  table  7.1  and  the 
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Component 

Frequency 
(hz) 

Number  of 
Packets 

PolUng 
(Kbytes/sec) 

Data 

(Kbytes/sec) 

Hand 

1,000 

3,000 

93.75 

281.25 

Tactile  Sensor 

100 

800 

25.00 

225.00 

Wrist 

80 

80 

2.50 

3.75 

PUMA  Arm 

500 

1,000 

31.25 

42.96 

DataGlove 

60 

60 

1.88 

3.75 

Finger  Servo 

60 

60 

1.88 

5.62 

Object  Servo 

30 

30 

.94 

3.75 

Total 

5,030 

157.20 

565.99 

Table  7.4:  Maximal  system  servo  traffic. 

PUMA  data  includes  both  the  sense  inputs  and  the  actuator  outputs.  In 
addition,  the  column  on  bandwidth  for  polling  assumes  the  worst  case  (i.e 
polling  for  every  message).  For  this  case  the  total  bandwidth  required  for 
servo  messages  (data  plus  polling)  is  723.19  Kbytes/second  of  a  total  of  1,280 
Kbytes/second  (assuming  a  10,485,760  bit/second  medium).  This  appears 
quite  reasonable  at  first,  but  there  is  more  system  overhead  to  be  axided 
later. 

This  is  the  figure  for  the  average  cycle.  To  check  that  the  system  is 
feasible  let  us  look  at  the  the  worst  possible  cycle.  This  is  a  cycle  in  which 
three  of  the  hand  packets  are  transmitted  and  one  of  each  of  the  other 
packets.  The  total  is  956  bytes  transmitted  in  one  cycle.  The  bcindwidth 
of  the  network  is  1,280  Kbytes/second  so  in  one  cycle  1,310  bytes  can  be 
transmitted  which  leaves  354  bytes  for  asynchronous  traffic  in  the  most 
heavily  used  cycle. 

One  problem  with  the  maximal  system  besides  the  large  amount  band- 
width required  is  that  the  frequencies  of  the  various  components  do  not 
match  nicely.  Creating  a  schedule  for  the  servo  messages  is  much  easier 
if  the  number  of  packets  required  divides  (or  is  divisible  by)  the  highest 
frequency.  So  now  consider  a  mid-range  system  with  moderate  compati- 
ble frequencies.  Table  7.5  describes  this  system,  which  has  a  total  servo 
bandwidth  (data  plus  polls)  of  480.71  Kbytes /second.  Also  consider  the 
low-range  system  which  uses  the  distilled  tactile  sensor  data  and  uses  VAL 
II  to  control  the  arm  (table  7.6).  Notice  that  the  frequency  of  the  hand,  the 
tactile  sensors,  aind  the  wrist  have  changed.  This  is  to  match  the  frequency 
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Component 

Frequency 
(hz) 

Number  of 
Packets 

PoUing 
(Kbytes/sec) 

Data 
(Kbytes/sec) 

Hand 

600 

1,800 

56.25 

168.75 

Tactile  Sensor 

75 

600 

18.75 

168.75 

Wrist 

75 

75 

2.34 

3.52 

PUMA  Arm 

300 

600 

18.75 

25.78 

DataGlove 

60 

60 

1.88 

3.75 

Finger  Servo 

60 

60 

1.88 

5.62 

Object  Servo 

30 

30 

.94 

3.75 

Total 

3,225 

100.79 

379.92 

Table  7.5:  Mid-range  system  servo  traffic. 

Component 

Frequency 

(hz) 

Number  of 
Packets 

Polling 
(Kbytes/sec) 

Data 

(Kbytes/sec) 

Hand 

480 

1,440 

45.00 

135.00 

Tactile  Sensor 

60 

80 

2.50 

17.50 

Wrist 

80 

80 

2.50 

3.75 

PUMA  Arm 

40 

80 

2.50 

3.44 

DataGlove 

60 

60 

1.88 

3.75 

Finger  Servo 

60 

60 

1.88 

5.62 

Object  Servo 

30 

30 

.94 

3.75 

Total 

1,830 

57.20 

172.81 

Table  7.6:  Lov 

v-range  system  servo  traffic. 

of  the  PUMA  under  VAL  II  which  is  determined  by  the  manufacturer  and 
can  not  be  changed.  The  low-range  system  h<is  a  total  servo  bandwidth  of 
230.01  Kbytes/second. 

The  final  system  we  will  consider  is  based  on  the  ping-pong  playing 
robot  developed  by  Russell  Andersson  [Ande88]  (the  values  used  came  from 
a  private  communication).  Table  7.7  presents  the  major  components  of 
the  system  aind  their  communication  requirements.  Table  7.8  shows  the 
bandwidth  required  for  the  servo  traffic  and  the  polling  messages  (worst 
case). 
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Description 

Data  bytes 
(per  cycle) 

Packet 

size 

Packets/ 
cycle 

Freq. 
(hz) 

Kbytes/ 
second 

Low 

Level 

Vision 

44 

76 

1 

60 

4.45 

Trajectory 
Analysis 

52 

84 

1 

60 

4.92 

Robot 
Controller 

500 

282 

2 

50 

27.54 

Kinematics 
Dynamics 

50 

82 

1 

40 

3.20 

Joint 
Actuators 

16 

48 

2 

1,140 

106.88 

Table  7.7:  Ping-pong  playing  robot  servo  traiffic. 


7.2.1     Exception  polling  and  fragmentation 

In  order  to  calculate  the  amount  of  synchronous  traffic  for  a  system,  two 
more  components  must  be  taken  into  account.  They  are  exception  polling 
and  fragmentation. 

In  order  to  accommodate  exceptional  conditions  which  are  infrequent, 
very  important,  and  asynchronous,  the  GANGLIA  protocol  has  an  exception 
poll.  Basically,  the  control  node  transmits  a  message  once  a  cycle  (or  as 
often  as  necessziry)  that  indicates  that  any  node  with  an  exception  message 
may  transmit  and  then  waits  briefly  for  any  exceptions  before  continuing 


Component 

Frequency 
(hz) 

Number  of 
Packets 

Polling 
(Kbytes/sec) 

Data 

(Kbytes/sec) 

Vision 

60 

60 

1.88 

4.45 

Trajectory 

60 

60 

1.88 

4.92 

Controller 

50 

100 

3.12 

27.54 

Kinematics 

40 

40 

1.25 

3.20 

Actuators 

1,140 

2,280 

71.25 

106.88 

Total 

2,540 

79.38 

146.99 

Table  7.8:  Ping-pong  system  bandwidth  requirements. 
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with  the  normal  pattern.  For  our  analysis  we  will  assume  the  exceptions 
are  polled  once  each  communication  cycle  and  that  the  control  node  waits 
for  an  interval  equal  to  the  transmission  time  of  the  poll  before  continuing. 
We  will  call  the  time  it  takes  to  transmit  a  poll  tp  (an  exception  poU  or 
a  normal  poll).  A  poU  is  simply  an  empty  message  and  thus  is  equal  to 
32  bytes.  Thus  2tp  or  64  bytes/second  of  the  total  bandwidth  is  consumed 
during  each  communication  cycle  for  exception  polling. 

Fragmentation  is  caused  by  the  fact  that  the  control  node  must  be  sure 
to  stop  polling  for  asynchronous  messages  in  enough  time  to  ensure  that 
it  is  able  to  start  the  next  communication  cycle  on  time.  Consider  the 
communication  frame  in  figure  7.1.  During  the  asynchronous  part  of  the 
message  the  control  node  must  not  poll  after  the  dashed  line  because  the 
response  message  may  not  finish  before  the  beginning  of  the  next  frame.  The 
worst  case  is  that  the  response  message  is  of  the  maximum  length.  Thus  the 
control  node  must  not  poll  within  tp  +  tmax  seconds  of  the  beginning  of  the 
next  frame,  where  tmax  is  the  time  it  takes  to  transmit  the  maximum  length 
message  (256  bytes  of  data  plus  32  bytes  of  header).  Note  that  a  response 
message  or  even  a  poll  may  transmit  in  the  interval  between  the  dashed  line 
and  the  start  of  the  next  frame,  but  the  control  node  may  not  start  a  poll 
in  this  interval.  This  lost  bandwidth  is  called  fragmentation. 

Determining  the  amount  of  fragmentation  is  not  as  straight  forward  as 
it  seems.  The  fragmentation  in  a  single  cycle  is 

V  —  ('p  +  *mox)  -  the  spillover  from  the  last  message 

Where  "the  spillover  ..."  is  the  amount  the  last  message  (including  the  poll 
for  the  message)  of  the  cycle  extends  beyond  tp  +  t„,ax  seconds  before  the  end 
the  cycle  (i.e.  the  dashed  line  in  figure  7.1).  The  expected  fragmentation  is 
then 

E\tj]  =  {tp  +  tmax)  -  f:[spillover]  (7.1) 

since  tp  +  tmax  is  a  constant.  The  problem  is  to  determine  £'[spillover]. 
Naively  using  the  analogy  of  fragmentation  on  a  disk  or  in  a  paging  system, 
one  would  say  that  f'fspillover]  is  one-half  of  the  combination  of  a  poU  and 
the  expected  message  size.  However,  since  the  messages  are  not  of  a  fixed 
length  this  does  not  hold. 

If  we  ignore,  for  the  moment,  that  each  polling  sequence  has  a  minimum 
size,  that  is  ignore  the  poll  message  and  the  header  of  the  response  message, 
then  the  messages  range  is  size  from  zero  bytes  to  tmax-  It  turns  out  that 
for  uniformly  distributed  message  sizes  that  If^f^m]  is  a  better  estimate 
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System 

Frequency 

Data 

Polls 

Exception 
Polls 

Fragmentation 

Maximal 

1,000 

565.99 

157.20 

62.50 

260.74 

Mid-range 

600 

379.92 

100.79 

37.50 

156.44 

Low-range 

480 

182.81 

57.20 

30.00 

125.16 

Ping-pong 

1,140 

146.99 

79.38 

71.25 

297.25 

System 

Synchronous 
Traffic 

Asynchronous 
Traffic 

Maximal 

1,046.43 

233.57 

Mid-range 

674.65 

605.35 

Low- range 

395.17 

884.89 

Ping-pong 

594.87 

685.13 

Table  7.9:  Synchronous  and  asynchronous  traffic  (Kbytes/sec). 


of  the  expected  spillover.  For  exponentially  distributed  message  sizes  with 
a  =  E[tm]  (i.e.  with  density  function  ^e"^/")  then  E[tm]  is  a  better  estimate. 
Appendix  A  gives  some  insight  into  the  derivation  of  these  values  and  also 
takes  into  account  the  poll  message  and  message  header.  A  report  by  Clark 
and  Mishra  [Clar89b]  contains  the  details.  In  any  case,  the  fragmentation 
is  dominated  by  the  maximum  message  size  and  the  spillover  of  the  last 
message  merely  reduces  the  fragmentation  by  a  fraction  of  tmax-  So  the 
amount  of  spillover  is  not  crucial  to  the  analysis. 

In  the  absence  of  concrete  data  the  the  characteristics  of  asynchronous 
messages  in  ganglia  we  will  assume  that  the  data  portion  of  the  messages 
are  exponentially  distributed  with  mean  size  of  32  bytes.  In  Appendix  A  we 
see  that 

i;[spiUover]  =  ^  +  ^ 

where  Hs  =  tp  +  E[tm]  is  the  expected  length  of  the  poll  for  a  message  and 
the  message  itself  (including  the  header)  and  tr^  is  the  variance  of  the  size 
of  the  data  portion  of  the  messages.  Assuming  an  exponential  distribution 
with  a  mean  message  length  32  bytes  and  plugging  this  into  equation  7.1  we 
get  tf  =  (32  +  288  -  53)  or  267  bytes  which  is  0.26  Kbytes. 

Now  we  can  compute  the  bandwidth  consumed  by  the  synchronous  traf- 
fic.   Table  7.9  shows  the  results  for  each  of  the  example  systems.    The 
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first  column  is  the  name  of  the  system  and  the  second  is  the  frequency  of 
the  ba^ic  communication  cycle  or  frame.  The  remaining  columns  are  in 
units  of  Kbytes/second  (IK  =  1,024).  Columns  3  and  4  are  from  previ- 
ous tables  and  are  the  amount  of  servo  data  zuid  the  amount  of  polling 
respectively.  The  next  two  columns  show  the  loss  due  to  exception  polls 
and  fragmentation  which  are  based  on  the  frequency  as  discussed  above. 
The  next  column  is  the  total  bandwidth  consumed  by  synchronous  traffic 
(the  sum  of  columns  3,  4,  5,  and  6).  The  last  column  is  the  bandwidth 
remaining  (from  a  total  of  1,280)  for  asynchronous  traffic.  For  comparison 
purposes,  233.57  Kbytes/second  is  equivalent  to  1.91  Mbits/second  while 
884.89  Kbytes/second  is  equivalent  to  7.24  Mbits/second. 

7.2.2     Reducing  synchronous  bandwidth. 

The  above  examples  tend  to  assume  the  worst  cases.  Not  just  the  worst 
statistical  cases  but  also  the  worst  design  cases.  These  assumptions  have 
increased  the  bandwidth  required  by  the  synchronous  traffic.  In  this  section 
we  will  look  briefly  at  some  of  these  assumptions. 

First  we  look  at  the  assumption  that  each  message  is  polled  for  indi- 
vidually. A  major  feature  of  the  ganglia  protocol  is  that  the  token  (or 
poll)  is  attached  to  each  message  so  that  common  sequences  of  messages 
can  be  transmitted  without  intervening  polls.  This  feature  is  especially  use- 
ful among  the  synchronous  messages.  Consider  the  maximal  system,  see 
table  7.4.  The  most  common  messages  are  the  hajid  servo  messages.  There 
axe  three  messages  from  three  sources  every  cycle.  These  are  perfect  candi- 
dates for  combination  into  a  thread  (see  section  5.5.3).  This  would  reduce 
the  number  of  polls  for  these  messages  from  3,000  to  1,000  which  means  a  re- 
duction of  62.5  Kbytes/second  in  the  bandwidth  consumed  by  polls  or  a  40% 
increase  in  the  bandwidth  available  for  asynchronous  traffic.  Notice  also  that 
the  arm  servo  messages  might  also  be  included  in  this  thread  which  would  re- 
move another  1,000  polls/second  and  increase  the  asynchronous  bandwidth 
by  a  further  20%. 

Simple  data  compression  techniques  can  also  be  applied.  The  data  for 
the  hand  servos  are  all  12  bit  values  (the  A/D's  and  D/A's  for  the  hand 
are  12  bits  wide)  but  the  calculations  in  table  7.1  reserve  16  bits  for  each 
value.  Packing  this  data  so  that  there  are  two  values  in  three  bytes  would 
save  25%  of  the  281.25  Kbytes/second  consumed  by  these  servos  which  is 
more  than  was  saved  by  reducing  the  polling  above.  The  tactile  sensor 
data  presents  an  interesting  exercise.  In  table  7.4  we  have  divided  the  data 
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produced  by  the  sensors  in  one  second  into  800  packets  with  256  bytes  of  data 
each.  U  we  are  able  to  divide  the  data  instead  into  1,000  smaller  packets 
we  increase  the  bcindwidth  consumed  by  the  data  packets  (because  there 
are  200  more  packet  headers)  but  the  overall  bandwidth  can  be  reduced  by 
including  these  packets  in  the  above  threads  and  thus  reducing  the  number 
of  polls  per  second  by  800. 

The  other  major  source  of  overhead  in  this  section  is  fragmentation. 
Fragmentation  is  based  on  the  fundamental  cycle  time  of  the  system.  This 
cycle  time  is  a  design  parameter  that  the  system  designer  may  be  able  to 
vary  but  rarely  by  a  significant  amount.  Fragmentation  occurs,  however, 
because  the  start  of  each  communication  cycle  is  considered  a  hard  deadline. 
K  the  deadline  can  be  softened  then  fragmentation  can  be  reduced.  In  the 
extreme  case  if  the  start  of  each  cycle  can  vary  by  244  /xseconds  then  there 
will  be  no  fragmentation  at  all  (244  //seconds  is  the  time  it  takes  to  poll 
for  and  transmit  the  largest  size  message).  This  is  25%  of  the  cycle  time 
in  the  maximal  system  which  is  probably  too  much  variance,  but  whatever 
variance  is  possible  reduces  the  fragmentation. 

Another  way  to  reduce  fragmentation  is  to  reduce  the  maximum  size  of 
the  packets,  since  tj  in  equation  7.1  is  dominated  by  the  maximum  packet 
size,  tmar-  ^  it  is  possible  to  limit  all  asynchronous  packets  to  128  bytes 
of  data,  for  instance,  the  amount  of  fragmentation  would  be  reduced  con- 
siderably. A  way  to  effectively  reduce  the  size  of  the  largest  packet  would 
be  to  poll  for  smaller  messages  toward  the  end  of  the  asynchronous  frame. 
Suppose  there  were  a  class  of  messages  that  were  known  to  be  small  -  for 
instance,  messages  that  report  non-critical  changes  in  status  of  the  nodes. 
The  control  node  could  then  poll  for  arbitrary  messages  during  the  initial 
part  of  each  asynchronous  frame  and  then  switch  to  polling  for  the  smaller 
messages  as  the  deadline  for  the  next  cycle  approached.  The  tmox  used  to 
calculate  fragmentation  would  then  be  the  maximum  size  of  these  smaller 
messages. 

7.3     Asynchronous  ticiffic 

In  this  section  we  consider  the  asynchronous  traffic  in  a  GANGLIA  system. 
The  asynchronous  traffic  is  harder  to  characterize  thjin  the  synchronous  traf- 
fic because  the  asynchronous  traffic  is  more  implementation  specific.  In  the 
Izist  section  we  presented  estimates  of  the  synchronous  traffic  for  several  hy- 
pothetical systems.   This  is  feasible  because  we  can  characterize  the  servo 
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data  needs  of  the  vaxious  devices  in  a  fairly  implementation  independent 
manner.  The  asynchronous  traffic  is  more  dependent  on  the  actuaJ  imple- 
mentation. It  depends  on  the  algorithms  used  and  the  instrumentation  the 
functionzdity  implemented.  As  a  first  pass  we  will  apply  a  standard  polling 
system  analysis.  We  will  then  discuss  some  of  the  particular  aspects  of  gan- 
glia systems  that  do  not  fit  the  standard  model.  In  particular,  we  wiU  look 
at  time-constrained  messages. 

Polled  communication  systems  are  divided  into  four  disciplines  [Taka88]. 

•  In  exhaustive  service  systems  a  station,  when  polled,  transmits  until 
it  has  transmitted  all  its  pending  messages. 

•  In  gated  service  systems  a  stations  transmits  aU  messages  that  are 
pending  at  the  time  it  is  polled.  The  difference  between  exhaustive 
and  gated  service  is  that  under  exhaustive  service  a  station  could  hold 
the  medium  indefinitely,  if  it  generates  messages  faster  than  it  can 
transmit,  whereas  gated  service  ensures  the  service  time  to  each  node 
wiU  be  finite.  As  a  trade-off  though  gated  service  systems  have  a  longer 
average  delay  time  thain  exhaustive  service  systems. 

•  An  intermediate  approach  is  the  decrementing  service  scheme.  In  this 
scheme  a  node  which  has  n  messages  pending  at  the  time  of  a  poll, 
trajismits  until  it  has  n  —  1  messages  pending. 

•  Under  the  limited  service  scheme  a  node  may  transmit  a  fixed  number 
of  messages  each  time  it  is  polled,  usually  1  message. 

Ganglia  falls  under  this  last  discipline. 

In  Analysis  of  Polling  Systems  [Taka86]  Tcikagi  extensively  analyzes  the 
various  polling  disciplines.  We  will  follow  his  ajialysis.  A  symmetric  system 
is  a  system  in  which  all  the  nodes  are  statistically  equivalent  (i.e.  they  have 
the  same  inter-arrival  time  distribution  for  messages  and  the  same  message 
length  distribution).  For  a  symmetric  polling  system  with  limited  service 
the  expected  waiting  time  for  a  message  is  given  by 

Em-    ^'    I    N{^bW  +  r{l  +  Xb)  +  XS^] 

^^^J-2r^  2[l-NX{r  +  b)]  ^^'^^ 

where  N  is  the  number  of  stations  or  nodes,  A  is  the  arrival  rate  for  messages 
at  each  station,  6  is  the  mean  message  service  time  (i.e.  f^[<m]  from  above, 
6'^'  is  the  second  moment  of  message  service  time,  r  is  the  mean  reply  time, 
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Figure  7.3:  Asynchronous  traffic  delay  {N  =  20). 


500 


and  6'^  is  the  variance  of  reply  time.  Reply  time,  which  is  sometimes  called 
walk  time,  is  the  time  it  takes  to  poll  each  node  (or  the  time  to  walk  from 
station  to  station).  In  a  ganglia  system  the  reply  time  is  the  same  for  all 
stations,  the  time  it  takes  to  transmit  a  poll  message.  Thus  ^^  =  0  and  we 
can  simplify  equation  7.2  to 


E[W]  = 


N[Xb^^)  +  r{l  +  Xb)] 
2[1  -  A^A(r -f  6)] 


(7.3) 


Figure  7.3  shows  the  average  delay  for  asynchronous  messages  in  the 
four  sample  systems.  The  following  assumptions  were  made:  the  number  of 
stations  A'^  =  20,  the  average  message  length  6  =  2r  or  32  bytes  of  data,  and 
the  variance  in  message  length  b^'^^  —  b^  =  r  or  32  bytes.  The  maximal  system 
fares  poorly.  The  asynchronous  traffic  in  this  system  would  be  very  sensitive 
to  the  load.  The  other  systems  are  feasible,  with  each  station  generating 
100  messages  per  second,  which  is  a  very  heavy  load,  the  average  delay  is 
less  than  0.08  second.  For  ganglia  systems  another  important  measure 
of  time  is  the  communication  cycle  time.  The  system  can  not  respond  to 
ajiything  involving  more  than  one  node  in  less  than  a  cycle  time  and  most 
of  the  activities,  particularly  at  the  low  levels,  will  be  tied  to  the  cycle  time 
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Figure  7.4:  Normalized  asynchronous  traffic  delay  (A^  =  20). 


to  some  extent.  Figure  7.4  shows  the  same  throughtput  versus  delay  graphs 
normalized  to  the  cycle  time  for  each  system. 

The  assumption  that  the  nodes  are  equivalent  is  not  reasonable  in  a  GAN- 
GLIA system.  One  would  imagine  that  typical  nodes  do  not  generate  a  lot 
of  asynchronous  traffic.  They  control  their  sensors  or  actuators,  producing 
or  consuming  servo  data,  and  they  respond  to  intermittent  commands  from 
higher  level  control.  On  the  other  hand,  the  node  that  connects  the  net- 
work to  the  higher  level  controllers  and  other  nodes  involved  in  higher  level 
functions  might  generate  more  thtin  their  share  of  asynchronous  traffic.  One 
might  suppose  that  the  node  connected  to  the  higher  level  controllers  would 
generate  nearly  half  of  the  asynchronous  traffic.  Traffic  patterns  such  as  this 
can  be  taken  into  account  by  the  control  node  to  minimize  the  delay  times. 
Takagi  (Taka86]  [Taka88]  provides  formulas  for  computing  delay  in  asym- 
metric systems  but  further  speculation  on  the  distribution  of  asynchronous 
load  in  this  thesis  is  unproductive. 
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7.3.1     Time- constrained  traffic. 

So  far  we  have  looked  at  asynchronous  traffic  in  the  traditional  non-time- 
constrained  manner,  however,  there  will  undoubtedly  be  time-constrained 
asynchronous  traffic.  Exception  messages  are  one  way  to  deal  with  time- 
constrained  messages.  An  exception  message  can  be  transmitted  within  one 
communication  cycle  time  (assuming  only  one  exception  is  raised)  and  this 
is  as  fast  as  the  system  can  respond  to  anything.  However,  the  intent  of 
the  protocol  design  is  that  exception  messages  be  reserved  for  critical  and 
rare  conditions.  In  the  remainder  of  this  section,  we  discuss  some  other 
approaches  to  handling  time-constrained  traffic. 

The  polling  sequences  we  have  discussed  are  deterministic,  so  there  is  a 
bound  on  the  interval  between  polls  for  any  node.  The  worst  case  would 
be  that  every  node  transmits  the  maximum  length  packet.  Tighter  bounds 
can  be  obtained,  though,  since  the  traffic  patterns  of  the  nodes  are  known 
and  there  are  probably  many  nodes  that  never  send  the  maximum  length 
packet.  Thus,  if  the  bound  is  within  the  time  constraint  of  the  message  then 
the  node  can  simply  ensure  that  the  message  is  transmitted  the  next  time 
the  node  is  polled.  Essentially,  time-constrained  messages  are  put  at  the 
head  of  the  nodes  transmission  queue.  There  are  two  problems  with  this 
scheme.  One  is  that  a  node  can  have  only  a  single  time-constrained  message 
pending  at  a  time.  This  is  restrictive  in  general  but  for  most  GANGLIA  nodes 
it  is  not  an  unreasonable  restriction.  The  other  problem  is  that  the  time 
constraint  may  not  be  satisfied  by  waiting  for  the  next  "normal"  poll. 

This  second  problem  could  be  addressed  by  polling  the  affected  nodes 
more  frequently.  The  central  node  does  not  have  to  poll  in  a  round  robin 
manner.  The  polling  schedule  can  be  made  to  match  the  time  constraints 
of  the  various  nodes.  This  is  not  a  general  solution.  It  could  solve  par- 
ticular problem  situations  but  trying  to  apply  it  in  general  would  lead  to 
complicated  and  brittle  schedules  and  the  extra  polls  tend  to  degrade  the 
non-time-constrained  performance. 

It  is  possible  that  a  condition  which  first  appears  to  be  asynchronous 
may,  infact,  be  better  handled  in  a  synchronous  manner.  If  the  condition 
is  sufficiently  important,  occurs  with  sufficient  frequency,  and  imposes  tight 
time  constraints  the  messages  associated  with  it  can  be  included  in  the  syn- 
chronous traffic.  Polls  for  the  appropriate  messages  would  be  included  in  the 
synchronous  schedule.  This  is  an  applicaton  of  a  technique  commonly  used 
to  schedule  Jisynchronous  processes  amongst  periodic  processes  in  a  time- 
constrained  system  [Mok83]  [Sha86].  In  ganglia  as  it  is,  this  is  the  most 
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general  way  to  accomodate  frequent  or  tightly  constrained  asynchronous 
conditions. 

Window  protocols.  An  extension  to  the  GANGLIA  protocol  would  al- 
low time-constrained  asynchronous  messages  to  be  handled  by  a  variant  of 
the  window  protocols  discussed  in  section  2.3.2.  The  main  objection  to  the 
window  protocols  in  that  section  was  that  they  require  a  sophisiticated  com- 
munication controller  in  each  node.  These  controllers  must  individually  keep 
track  of  the  global  state.  That  is,  as  collisions  are  detected  and  messages 
transmitted  each  node  must  update  its  internal  records  of  the  state  of  the 
protocol.  All  of  the  nodes  must  agree  on  the  state  for  the  protocols  to  work 
properly.  In  a  GANGLIA-like  system  only  the  control  node  must  maintain  the 
state  of  the  protocol.  The  extended  GANGLIA  would  work  as  follows:  Most 
of  the  time  the  system  would  behave  normally.  Periodiccilly  the  control  node 
would  poll  for  time-constrciined  messages,  in  a  manner  similar  to  polling  for 
exceptions.  It  would  poll  by  sending  a  special  message  containing  the  cur- 
rent boundries  of  the  window.  Then  any  node  with  a  message  in  the  window 
could  transmit.  This  might  lead  to  a  collision,  if  two  or  more  nodes  have 
messages  in  the  window.  In  this  case  the  control  node  detects  the  collision 
ajid  updates  its  boundries  in  accordance  with  the  windowing  scheme.  A  new 
poll  might  be  transmitted  immediately  or  the  control  node  may  wait  until 
the  next  scheduled  poll  for  time-constrained  messages. 

This  scheme  is  appealing.  First  of  all,  nodes  which  do  not  transmit  time 
constrained  messages  are  not  involved  at  all.  An  unsophisticated  node  only 
has  to  respond  to  a  direct  poll,  just  as  before.  Second,  it  is  not  necessary 
that  any  node  detect  collisions  but  the  control  node.  This  can  simplify  the 
transceiver  electronics.  The  nodes  do  need  to  know,  somehow,  that  their 
message  was  successfully  transmitted.  This  could  be  achieved  by  a  message 
from  the  control  node.  Finally,  it  is  possible  that  this  scheme  could  supplant 
the  whole  exception  message  mechanism.  How  a  windowing  mechanism 
could  be  integrated  in  GANGLIA  is  an  avenue  for  further  research. 
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*^ERVO  LOOPS  are  a  fundamental  part  of  robot  control  software,  o  They 
perform  low-level  basic  functions  and  are  usually  under  the  tightest  time 
constraints  of  all  the  pcirts  of  the  system.  They  also  have  special  properties 
that  that  can  be  exploited  by  the  operating  systems  and  communication 
systems  that  support  them.  This  thesis  presents  an  operating  system,  HIC, 
and  a  network,  GANGLIA,  were  designed  explicitly  for  servo  loop  systems. 

8.1  Hic 

There  are  two  interesting  parts  of  HIC.  The  scheduler  is  elegant,  simple,  well 
suited  for  servo  loops  and  nicely  combines  periodic  and  aperiodic  events  into 
a  single  paradigm.  Periodic  Data  Buffers  are  a  new  communication  structure 
which  are  a  simple  yet  effective  inter-process  communication  scheme  for 
servo  loops.  The  advantage  of  pdb's  is  that  they  dse  a  non-blocking  method 
of  communication.  This  is  importJint  in  Hic  and  to  servo  loops  but  it  is 
unlikely  that  they  could  be  used  much  in  other  areas. 

8.2  Ganglia 

Ganglia  is  a  communication  architecture  and  protocol  for  intra-robot  com- 
munication that  has  wide  ranging  implications  for  robot  controller  architec- 
tures. The  idea  is  to  try  to  place  the  computing  power  in  a  robot  near  the 
actuators  and  sensors.  This  would  reduce  problems  of  cabling  between  a 
robot  and  its  controller.  It  would  reduce  noise  problems  on  delicate  signals 
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from  distant  sensors.   It  would  also  provide  a  sort  of  modularity  to  robot 
components. 

The  contributions  of  ganglia  are: 

•  a  polling/token  passing  protocol  that  integrates  synchronous  and  asyn- 
chronous time-constrained  traffic. 

•  "named  messages"  which  are  messages  that  are  identified  by  their 
contents  instead  of  their  destination  address. 

•  a  communication  manager,  the  CMMU,  which  handles  the  token  pass- 
ing protocol  and  in  conjunction  with  named  messages  implements  a 
network-wide  memory.  The  CMMU  also  facilitates  synchronization 
between  nodes  via  a  test-and-set  operation. 

The  Analysis  chapter  demonstrates  that  the  use  of  GANGLIA  is  feasible 
in  severzJ  real  robot  systems  that  represent  the  state  of  the  art  in  robot 
complexity. 

There  is  much  work  that  needs  to  be  done  on  ganglia.  Some  of  the 
prominent  tasks  are: 

•  that  the  test-and-set  operation  as  described  is  susceptible  to  commu- 
nication failure.  Suppose  that  the  message  returned  by  the  node  that 
sets  the  tsst^and-set  flag  is  lost.  Then  the  node  requesting  the  op- 
eration is  uncertain  whether  or  not  it  was  successful  and  as  described, 
there  is  no  way  to  find  out.  Similarly,  if  the  clear  message  is  lost  there 
is  no  way  to  recover.  One  solution  would  be  to  indicate  which  node 
or  process  sets  the  flag,  in  which  case  it  is  possible  to  resolve  these 
situations. 

•  to  develop  a  suitable  method  for  collision  resolution  when  more  than 
one  node  responds  to  an  exception  poll.  The  method  suggested  in 
section  5.5  is  workable  but  a  superior  method  should  be  found. 

•  whether  or  not  it  is  feasible  to  implement  some  form  of  window  proto- 
col for  time-constrained  asynchronous  traffic.  An  outline  of  a  method 
is  made  in  the  text  but  its  feasibility  should  be  investigated. 

•  the  investigation  of  methods  of  flow  control.  If  there  were  a  distributed 
method  to  reduce  the  load  by  inhibiting  less  important  messages  dur- 
ing periods  of  heavy  load  then  the  burden  of  the  scheduler  and  the 
control  node  could  be  eased.  A  possible  approach  is  to  associate  a 
time  interval  with  each  message  and  if  the  time  interval  between  con- 
secutive times  that  a  node  receives  the  token  exceeds  this  given  interval 
then  the  message  is  not  transmitted,  although  the  token  is  passed  on. 
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Preliminary  investigations  [Dixo87]  of  this  technique  indicate  that  it 

provides  upper  bounds  on  the  interval  between  times  the  node  receives 

the  token  -  a  desirable  property. 

that  it  has  been  «LSSumed  that  the  CMMU  could  be  implemented  on 

a  single  chip,  or  perhaps  even  that  a  processor  and  CMMU  could  be 

implemented  on  a  chip.   The  VLSI  implementation  of  the  CMMU  is 

critical  to  the  eventual  success  of  GANGLIA. 

that  GANGLIA  should  be  implemented.  Prototypes  with  CMMU's  built 

from  components  (even  simplified  CMMU's)  should  be  constructed. 

There  are  also  questions  about  the  physical  ajid  data  link  layer  of  the 

protocol  that  must  be  answered,  and  then  implemented. 
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J.  HIS  APPENDIX  provides  descriptions  of  ajid  informaJ  justification  for  the 
calculations  used  to  estimate  the  fragmentation  of  communication  cycles  in 
GANGLIA.  Formal  treatment  of  the  problem  cam  be  found  in  a  report  by 
Clark,  Mishra,  and  Spencer  [Clar89b]. 

A.l     Spillover 

Fragmentation  in  Ganglia's  communication  cycles  is  caused  by  the  fact 
that  during  a  communication  cycle  the  control  node  must  stop  polling  for 
asynchronous  messages  in  time  to  ensure  that  the  it  will  be  able  to  start 
the  next  cycle  on  time.  Consider  figure  7.1  on  page  99  which  shows  the 
communication  cycle  or  frame.  If  <p  is  the  length  of  a  poll  (this  is  also 
the  length  of  a  message  header)  and  t^ax  is  the  maximum  message  size 
(header  and  data)  then  the  control  node  must  stop  polling  tp  +  tmox  seconds 
before  the  end  a  the  cycle  to  ensure  that  the  message  in  response  to  the 
poll  does  not  prevent  the  control  node  from  beginning  the  next  cycle  on 
time.  The  dashed  line  in  the  figure  represents  the  point  tp  +  imax  seconds 
before  the  end  of  the  cycle.  The  fragmentation  in  each  cycle  is  not  equal  to 
^p  +  tmax,  however,  since  a  message  or  poll  initiated  before  the  dashed  line 
may  continue  transmitting  into  the  interval  between  the  dashed  line  and  the 
end  of  the  cycle.  Thus,  the  fragmentation  in  each  cycle  is  reduced  by  the 
ajnount  that  the  last  message  in  the  cycle  falls  into  this  interval.  Calculating 
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N 
Figure  A.l:  Simplified  "spillover"  problem. 


the  amount  of  fragmentation  thus  becomes  the  problem  of  calculating  the 
amount  that  the  last  message  in  each  cycle  "spills  over"  into  this  interval. 

We  w  ill  begin  with  a  more  abstract  problem.  Consider  a  bin  full  of  sticks. 
The  sticks  are  each  of  integral  length  with  the  maximum  length  of  k  units. 
We  will  draw  sticks  from  the  bin  at  random  one  stick  at  a  time.  The  supply 
of  sticks,  for  our  purposes,  is  inexhaustible  so  that  finding  the  bin  empty 
is  never  a  problem.  Now  consider  a  line  on  the  floor  starting  at  the  wall 
and  terminating  at  a  mark  which  is  A^  units  from  the  wall.  Now  suppose  we 
draw  sticks  from  the  bin  and  placing  them  end-to-end  along  the  line  starting 
at  the  wall  and  stop  when  a  stick  covers  the  mark.  The  problem  of  message 
spillover  corresponds  directly  to  the  problem  of  determining  how  far  the  last 
stick  extends  beyond  the  mark.  The  reader  should  note  that  to  be  consistent 
with  the  original  problem  if  the  length  of  the  sticks  is  exactly  N  then  one 
more  stick  is  drawn  and  added  to  the  line.  In  other  words  "covering  the 
mark  at  A^"  means  that  the  point  A  -|-  1  units  from  the  wall  is  reached. 
Figure  A.l  shows  the  situation. 

We  will  informally  derive  formulas  for  the  spillover  in  two  cases.  The 
first  is  the  case  when  the  lengths  are  uniformly  distributed.  The  second  is 
for  a  general  distribution  stick  lengths. 
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A.  1.1     Uniform  Distribution 

First  assume  that  the  lengths  of  the  sticks  are  uniformly  distributed  from 
1  to  k.  Further  assume  that  A^  is  much  larger  than  k.  Then  if  we  draw 
sticks  and  line  them  up  until  the  total  length  of  the  sticks  exceeds  A'^  there 
will  be  approximately  the  same  number  of  sticks  of  each  size.  Let  L  be  that 
number.  Then 

*  2N 


y^ Li  fa  N  and  L  ^  , ,, 

fr{  k{kArl) 

K  we  consider  only  sticks  of  length  i  then  they  will  cover  approximately  Li 
units  of  the  line.  So  that 

Pr[a  stick  of  length  i  covers  the  mark  at  A']  «  —  = 


A'        k{k-\-\) 
Now  we  can  compute  the  expected  length  of  the  stick  covering  the  mark. 

^         2t  2  * 

E\i  covering  mark  at  TV]     ss     V^  - — ; -i  =  -— y^  t^ 

^  ^  ^  ^,k(k+l)        kik  +  l)^^ 

2         k{k  +  l){2k  +  l)  _  2fc 
~     kik+l)  6  ~   3 

This  mezois  that  if  we  were  to  repeatedly  draw  sticks  and  line  them  up  we 
would  find  that  "typically"  a  stick  approximately  2k /3  units  in  length  would 
cover  the  mark  A^  units  from  the  wall. 

Now  the  question  is:  how  much  of  the  stick  extends  beyond  the  mark? 
It  is  shown  in  Mishra  et  al.  that  for  large  A'  the  last  stick  extends  beyond 
the  mtirk  by  half  its  length.  So  we  would  expect  the  spillover  to  be  about 
k/3  units. 

A. 1.2     General  Distribution 

In  this  section  we  consider  the  problem  when  the  sticks  have  &n  arbitrary 
distribution.  More  precisely,  for  1  <  i  <  Ar  there  is  a  0  <  pi  <  1  such  that 
p,  =  Pfstick  of  length  t  is  drawn].  As  in  the  last  section,  we  draw  sticks 
until  the  mark  at  A^  is  covered,  where  N  ^  k.  Only  this  time,  the  various 
stick  sizes  will  not  appear  with  the  same  frequency.  Let  I,,  be  the  number 
of  times  that  a  stick  of  length  i  is  placed  in  the  line.  Then 
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and  for  each  i 

L,  «p,(Ii  +  L2  +  ■  ■  ■  +  Lk) 

Combining  these  we  get 

k 

t=i 

k 


^{Li  +  L2  +  ■  •  •  +  Lk)ip, 


t=i 


=     {Li-\-L2  +  ---  +  Lk)J2*Pi 


=     (Zi  +  1-2  4- \- Lk)E[stickl€ngth] 

Let  the  random  variable  S  represent  the  stick  length  so  that  £[stick  length]  =| 
E[S]  =  us-  Proceeding  as  we  did  in  the  previous  section  we  get 

Lit 
Pr[a  stick  of  length  i  covers  the  mark  at  N]    «     -rj- 

ip,{Li  +  L2  +  ■  ■  ■  +  Lk) 
fisiLi  +  L2  +  ■  ■  ■  +  Lk) 
iPi 


and 


it      -2 
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E[i  covering  mark  atiV]    ss    ^ 


E[S']  _  Ml  +  ctI 

With  a  general  distribution  of  stick  sizes  it  is  not  true  that  the  spillover 
will  be  half  of  E[i  covering  mark  at  N].  Consider  the  case  when  the  sticks 
are  aD  the  same  size.  Then  if  the  sticks  are  all  of  size  j,  the  spillover  is 
exactly  j  mod  N  if  j  mod  A'^  ^  0  and  j  if  j  mod  N  =  0.  However,  in  the 
report  it  is  shown  that  for  a  many  distributions  it  is  reasonable  to  say  that 

2 

f;[spillover]  =  E[i  covering  mark  a.tN]/2  =  —  +  -^  (A.l) 

2        2/is 
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A. 2     Message  Fragmentation 

In  the  previous  section  we  developed,  informally,  a  formula  for  the  expected 
spillover  in  the  abstract  problem  of  placing  sticks  on  a  line.  The  end  result 
is  equation  A.l.  Messages  in  ganglia  have  a  form  which  allow  us  to  further 
refine  the  estimate  of  the  spiUover.  In  the  communication  cycles  the  "stick" 
corresponds  to  the  poll-response  sequence.  During  the  asynchronous  frame 
the  control  node  transmits  a  poll  message  which  is  tp  bytes.  The  polled 
node  must  respond  with  a  message  whose  header  is  also  tp  bytes  long  and 
whose  data  portion,  tp,  is  zero  to  Dmax  bytes  long  (where  Dmax  is  the 
maximum  size  for  the  data  portion  of  a  message  which  is  equal  to  tmax  —  tp). 
Thus  the  "length"  of  the  stick  is  ts  =  2fp  +  Id  where  tp  is  a  constant  and 
0  <  to  ^  Dmax-  To  calculate  £'[spiUover]  we  need  fis  and  a^.  The  mean, 
fis  =  E[ts],  is  simply  2tp  +  £[<£>]  or  2tp  +  ^d  since  tp  is  a  constant.  The 
variance  cr^  is  E[t^]  —  /x|  where 

E[tl]    =     E[{2tp  +  to)^] 

=    4tl  +  4tpHD  +  E[tl] 

Substituting  and  simplifying  we  get 

<'S  =  E[tl]  -txl  =  al 
And  finally, 

f;[spillover]  =  ^  +  ^  (A.2) 

In  GANGLIA  the  header  is  32  bytes  long.  For  purposes  of  analysis  we  will 
assume  the  data  portion  of  the  messages  is  exponentially  distributed  with  a 
mean  length  of  32  bytes.  Thus  /is  =  96  bytes  and  cr|  =  cr^,  =  (32)^  and  we 
get  £'[spillover]  =  53.33  bytes. 

Two  further  notes  should  be  made.  The  first  note  is  that  in  a  GANGLIA 
system  the  lengths  of  the  asynchronous  frames  are  not  all  the  same  (i.e. 
N  is  not  a  constant).  This  adds  cinother  random  variable  to  the  analysis. 
However,  there  are  a  small  number  of  different  cycle  lengths.  So  if  detailed 
analysis  were  desired  for  a  particular  system  the  analysis  could  be  applied  to 
each  frame  size  separately.  The  second  note  is  that  in  ganglia  systems  it  is 
not  clear  that  N  is  large  (an  assumption  that  makes  the  above  derivations 
go  more  smoothly).  If  one  considers  the  "ma.>dmal  system"  example  from 
section  7.2  in  the  text,  it  is  quite  possible  that  during  some  cycles  there  is 
only  time  for  one  or  two  polls  during  the  asynchronous  frame.  However,  in 
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the  absence  of  real  data  and  lacking  more  powerful  analysis  tools  we  will 
the  equations  derived  in  this  appendix. 
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